Background

Many peptides (hormones and neuromodulators) function as chemical signals between cells of multicellular organisms via specific receptors on target cells. Some peptides (neuropeptides) act as peptide neurotransmitters in neurotransmission and as peptide hormones in cell–cell communication for endocrine regulation of target cellular systems [1]. Due to the diversity in their primary sequences, the neuropeptides as other biologically active peptides display an extraordinary structural diversity [2] and are instrumental in numerous and important biological events [1, 36]. The biological activity of peptides is usually mediated by G-protein coupled receptors, or in some cases by enzyme-linked receptors (such as the insulin receptor).

By interacting with their specific receptors, these peptides function in a large number of physiological processes including feeding and body weight regulation, fluid intake and retention, pain, stress, and cognition, as well as numerous physiological functions of neurological and psychiatric relevance [1, 36]. For example, ocytocin is a neurohypophyseal peptide hormone which induces milk ejection and uterine contractions in mammals [7, 8]. Vasopressin, the other neurohypophyseal peptide hormone, regulates water resorption in the kidneys [9]. Galanin is involved in mediating cognition [10, 11], and corticotropin-releasing factor participates in the control of depression [12]. The octapeptide angiotensin II exerts its actions by binding to two pharmacological receptors which mediate its physiological effects such as vasoconstriction, stimulation of sympathetic transmission, cellular growth and differentiation, antiproliferation, and vasodilation [13, 14]. The family of tachykinin peptides includes substance P, neurokinin A, and neurokinin B which are endogenous ligands implicated in several neurological diseases such as Alzheimer and Parkinson diseases, pain transmission, and neurological inflammation [15]. Interestingly, the same neuropeptide often performs functions as a neurotransmitter in the nervous system and as a peptide hormone in peripheral endocrine systems [36]. Indeed, enkephalins function as neurotransmitters and are also involved in peripheral actions, including regulation of intestinal motility and immune cell functions [16]. Similarly, ACTH (adrenocorticotropin hormone) functions as a neuromodulator in the brain and as a peptide hormone in the adrenal cortex by controlling the glucocorticoid production [17, 18]. To exert their physiological functions, all of these bioactive peptides should adopt different conformations during their biological life time [2]. In fact, three events affect the fate of proteins and peptides after their synthesis: sorting to subcellular localization sites, processing/degradation, and other post-translational modifications [19]. All these events are noticeably controlled by partial amino acid sequences (or domains) that are recognized as signals by some specific molecular machinery within the cell [2, 2022].

Many processing reactions in the cells involve several and different proteases and peptidases [23]. Among these proteases, there are the proprotein convertases (PCs) which process protein and peptide precursors (proproteins and propeptides) trafficking through the secretory pathway [2429]. Indeed, several proteins and peptides are synthesized as inactive precursors, which when converted to their mature forms by PCs generate a large diversity of bioactive proteins and peptides within the central nervous system as well as in endocrine cells [1, 36]. The PCs are a family of seven subtilisin/kexin-like endoproteases including furin, PC1/3, PC2, PC4, PACE4, PC5/6, and PC7 [2427]. The structures of these serine proteases resemble those of both the bacterial subtilisins and yeast kexin [30]. Generally, these endoproteases cleave the precursor substrates at the C-terminal side of single, paired, or tetra basic amino acid residues within the consensus motif [R/K]–[X] n –[R/K]↓, where X indicates any amino acid residue, R/K designates either an arginine or a lysine residue, and n (the number of spacer amino acid residues) is 0, 2, 4, or 6 [2429]. After proteolysis by the convertases, the carboxy-terminal basic amino acids of protein/peptide intermediates are eliminated by specialized metallocarboxypeptidases (CPE and CPD) leading to the mature peptides [31]. In some cases, these peptides may undergo additional post-translational modifications (e.g., C-terminal amidation, N-terminal acetylation, glycosylation, sulfation, and phosphorylation) prior to the formation of the final bioactive peptides [3234].

The PCs and CPs are responsible for the processing of many precursor substrates including neuropeptides (e.g., enkephalin and dynorphin), peptide hormones (e.g., ocytocin and somatostatin), growth and differentiation factors (e.g., the bone morphogenetic protein/transforming growth factor β family), receptors (e.g., Notch and insulin receptor), enzymes (e.g., PCs and matrix metalloproteinases), adhesion molecules (e.g., α chains of integrins and collagens), blood coagulation factors (e.g., von Willebrand factor and factor IX), and plasma proteins (e.g., albumin and α1-microglobulin) [2429]. The cleavage of several different substrates by a relatively small number of PCs may be explained by an extensive overlapping in the expression pattern of PCs, several of which are expressed simultaneously in all cells [24, 26, 35]. Alternatively, the PCs recognize some regions of structure in addition to the single or paired basic residues [2, 22], or their actions may be directed by conformation of the processing domains which could focus the action of protease onto particular sites [20, 21]. Nevertheless, specific enzyme–substrate couples do exist in vivo because precursors such as proglucagon [28], prosomatostatin [36] or proopiomelanocortin (POMC) [37] undergo differential processing depending of the cell type-specific expression of PCs. Moreover, the PCs differ by their activation pathway [26] since they function in the Golgi apparatus (furin and PC7), in the secretory vesicles (PC1/3 and PC2), and also on cell surfaces (PC5/6 and PACE4). In short, since a redundancy of substrate cleavage specificity is found between PCs, and each one of them has some unique substrates in vivo [20, 21], the cleavage of precursors depends on the structural properties of recognition sites and/or the differential distribution of PCs during processing.

This review focuses on protease mechanisms for neuropeptide biosynthesis with special emphasis on the importance of privileged secondary structures and of given amino acid residues around basic cleavage sites in substrate recognition by the processing endoproteases.

Protein features involved in the processing of peptide and protein precursors

Many biologically active peptides and proteins are initially synthesized as larger, inactive precursors, generally in the form of pre-proproteins which are post-translationally modified to generate the mature molecules (Fig. 1). The N-terminal pre-region represents the signal peptide which directs the precursors to the appropriate cellular compartment, whereas the domains of the pro-region participate in the correct folding of synthesized peptides and proteins [21] and the protein transport and localization [19], or constitute recognition signals for the proteases involved in the maturation of peptide and protein precursors [2, 20, 22].

Fig. 1
figure 1

Schematic representation of some neuropeptide and hormone precursors. Proteolytic processing occurs at dibasic and monobasic sites, as well as at multibasic sites. The precursor proteins may contain one copy of the active neuropeptide (proCRF or proGIP), multiple copies of the active neuropeptide (proTRH) or distinct peptide hormones (POMC or proglucagon). The bioactive molecules and the Arg(R) and Lys(K) residues are indicated by yellow colour and black bars, respectively

Proteolytic processing of peptide and protein precursors is an essential regulatory mechanism used by the cells to control the level of specific bioactive polypeptides or the production of diverse molecules from a multifunctional precursor such as POMC and proglucagon (Fig. 1). Associated generally to mechanisms of the activation/inactivation of many peptides and proteins and the regulation of their cellular localization, the post-translational processing by limited proteolysis underpins a large number of biological phenomenons such as zymogen activation, blood coagulation cascade, prohormone processing, complement activation, and angiogenesis [2, 2429, 3642]. Examination of the primary sequences of many secretory proteins and native peptides or hormones (Fig. 1) indicated that: (1) each precursor possesses a distinct primary sequence, and (2) the mature neuropeptides and proteins within their precursors are generally flanked at the NH2- and/or COOH-termini by pairs of basic residues, as well as by monobasic residues. These observations imply that the conversion of precursors to active peptides occurs at cleavage sites containing at least basic residues. For these purposes, specific proteolytic processing systems have been studied by a number of workers.

By analyzing both the primary and secondary structures of 53 peptide and protein precursors, a first study [43] permitted the deduction of the following features: (1) not all the putative, potential, cleavage sites are processed in vivo, (2) the recognition of endoproteases was not correlated with the existence of a single consensus primary sequence around the cleaved sites, and (3) the processing loci are preferentially situated in, or in the immediate vicinity of, privileged secondary structures constituted by β-turns whereas the non-cleaved sites are associated with ordered structures such as α-helices or β-sheets. Similarly, a simple scheme to predict Ω-loops from protein amino acid sequences was developed and subsequently applied to the prediction of prohormone cleavage sites [44]. In another study [45], it was observed that proteolytic processing sites in seed proteins are found at sequences with a very high probability to form β-turn.

In addition to the study of secondary structures in the vicinity of dibasic cleavage sites, we have also analyzed the amino acid frequency around 352 potential dibasic sites contained in 83 propeptides and proproteins [46]. This study pointed out that the occurrence of given residues from positions P6 to P′4 was characterized by a large variability in composition and properties of residues, that no major contribution of a given precursor subsite to endoprotease specificity was observed, and that some amino acid residues appeared to occupy some positions preferentially whereas some others appeared to be excluded. From these observations, we have deduced that the specificity of processing proteases is dictated by the stereochemistry and flexibility of peptide substrates and that the enzyme–substrate interaction occurred through multiple anchoring on both sides of the scissile bond [43, 46]. Other research groups have proposed general observations for cleavage recognition sites based on the occurrence of amino acids appearing close to cleaved sites [25, 47, 48]. For example, four rules and five tendencies were deduced for the study of the amino acid frequency around the monobasic sites [47], and upstream basic residues at the −4 and/or −6 position were shown to affect the specificity of endoproteases [25]. However, these observations result from the motifs that are cleaved without knowledge of the acting PCs and the non-cleaved motifs are typically ignored. For these reasons, several approaches have been reported to predict the potential neuropeptides generated from a precursor sequence by quantitative estimation of processing probabilities [4953]. Indeed, different models (e.g., logistic regression, artificial neural network, and Known Motif models) have been developed by using diverse datasets generated from precursors of mollusk neuropeptides, viral and eukaryotic proteins, mammalian neuropeptides, and RFamide family [4953]. Significant differences were found between these studies, probably because of disparities in databases used to assess these approaches. Comparative analysis of neuropeptide cleavage sites in human, mouse, rat, and cattle supports this hypothesis [54]. Indeed, this study demonstrates that there are species- and precursor-specific processing patterns, indicating that amino acid and amino acid properties have a major impact on the probability of cleavage and comparable effects in these species.

For these different reasons, several studies, using different experimental approaches, were undertaken in order to elucidate the molecular mechanism involved in the processing of peptide and protein precursors [26, 55]. To establish which protein features are responsible for the recognition of dibasic cleavage sites by their corresponding processing proteases, we have used as models the common precursor of the ocytocin and neurophysin (pro-OT/Np) and the somatostatin precursor (pro-Som).

Post-translational modification and secondary structures: model of pro-ocytocin/neurophysin

As shown in Fig. 2a, the precursor of ocytocin/neurophysin exhibits the N-terminal ocytocin sequence (OT) separated from the C-terminal neurophysin domain (Np) by a “restriction” sequence Gly-Lys-Arg which is excised during proteolytic conversion and subsequent amidation. The eicosapeptide OT/Np(1–20) (Fig. 2a), corresponding to the processing domain encoded by exon I in the pro-OT/Np gene, was predicted to organize as a β-sheet/β-turn/α-helix arrangement [55].

Fig. 2
figure 2

Post-translational processing of proocytocin/neurophysin (OT/Np). a Schematic representation of the ocytocin–neurophysin precursor; the OT/Np(1–20) amino-terminal sequence of OT/Np is expanded. b Stereo views of the molecular models of OT/Np(7–15) and OT/Np(7–20) peptides [58]

Evidence for the presence of β-turn in the vicinity of dibasic cleavage sites

Among the different OT/Np peptides reproducing or mimicking the dibasic cleavage site of pro-OT/Np [55], those bearing the Pro7-Leu15 sequence were shown to be cleaved with high efficacy (Table 1), indicating that the tetrapeptide Pro7-Leu-Gly-Gly10, predicted to adopt a β-turn structure in the processing domain [43], is essential. The solution conformation of those peptides, performed by different spectroscopic techniques, supported this hypothesis [55]. Indeed, the circular dichroism (CD) spectra of peptides OT/Np(7–15), OT/Np(7–20), OT/Np(8–15), and OT/Np(8–20) (Fig. 3) are indicative of a conformational equilibrium between aperiodic structures and folded conformations (β-turn and α-helix) according to the classification of Woody [56, 57]. Moreover, NMR analysis of these peptides confirmed the above observations [58] and energy minimization methods permitted to build molecular models which emphasize the structural organization of each peptide segment (Fig. 2b). In particular, the NH2-terminal of peptides OT/Np(7–15) and OT/Np(7–20) involves a β-turn of type II starting from residue Pro at position 7. These data, supported by the study of the processing domain OT/Np(1–20) [59], reveal the presence of β-turn structure in the vicinity of the dibasic cleavage site.

Table 1 Kinetic parameters for the cleavage of OT/Np-related peptides adopting β-turn, β-sheet, or α-helix conformations [46]
Fig. 3
figure 3

CD spectra of proocytocin/neurophysin peptide analogues. Far-UV CD spectra of OT/Np(8–15) (curve 1), OT/Np(7–15) (curve 2), OT/Np(8–20) (curve 3) and OT/Np(7–20) (curve 4) peptides in 50% TFE [58]

These conclusions, confirmed by different studies conducted on the OT/Np [60], Adipokinetic hormone [61] and insulin [62] precursors, support the concept that β-turn structures constitute recognition signals for the processing endoproteases [2, 43].

Role of β-turn structures in the endoproteolytic cleavage of substrates

Since the recognition of dibasic sites by processing endoproteases was not correlated with the existence of a consensus primary sequence, replacement of the sequence Pro7-Leu-Gly-Gly10 by non-homologous peptide stretches, known to organize as β-turn structures in proteins, could not abolish the endoproteolytic cleavage of peptide derivatives. As shown in Table 1, the modified peptides [[S7–G10]OT/Np(7–15), [Y7–Q10]OT/Np(7–15), and [N7–A10]OT/Np(7–15)], as well as the reference peptide OT/Np(7–15), were cleaved without important effects on their kinetic parameters [63]. Analysis of solution conformations by CD confirmed that these nonapeptides possess the propensity to organize in β-turn conformers according to the classification of Woody [56, 57]. Together, these observations supported the hypothesis that the proteolytic processing loci share, in their vicinity, β-turn structures as a common structural feature that is interchangeable.

Since the β-turn structures appeared to be essential for the cleavage of substrates by the processing proteases, their replacement by ordered structures might affect the enzymatic reaction [63]. Compared to the reference peptides [OT/Np(7–15), OT/Np(7–20), and OT/Np(1–20)], the peptides, designed to promote formation of β-sheet [[I7–L10]OT/Np(7–15)] or α-helix [[I7–L10]OT/Np(7–20) and [A3–V10]OT/Np(1–20)] structures, are essentially characterized by high values of K m (low affinity) (Table 1). Analysis of solution conformations of these peptide substrates indicates that the CD spectrum of the peptide [I7–L10]OT/Np(7–15) is typical of β-sheet conformation, whereas, for the peptides [I7–L10]OT/Np(7–20) and [A3–V10]OT/Np(1–20), the shape of their CD spectra is characteristic of α-helical folding [63]. Together, these data indicate that the turn structures, at the cleavage loci, are the major determinant in the substrate affinity.

Put into an enzymatic context, the presence of reverse turns, around the proteolytic loci, and not the existence of a consensus primary sequence, suggests that the prohormone convertases share a common mechanism which allows them to cleave a large variety of distinct precursors [2, 43].

Role of substrate dynamics in the kinetics of the dibasic cleavage sites

To be cleaved with a high efficiency, the substrates should have the ability to reorganize the local environment of their subsites in order to interact with the dibasic specific convertases in an optimal manner. This was clearly shown by a study indicating that replacement of the P′1 residue by various amino acid residues changed both the cleavage rate of the OT/Np(7–15) segment and its affinity for the protease [46]. Since those OT/Np(7–15) peptide analogs were shown to be characterized by the same conformation, the dynamics of substrates must therefore play a role in modulating their proteolysis. So, the plasticity and peptide motions of OT/Np(7–15) peptides, bearing Phe, Tyr, and Trp amino acid residues at the P′1 position, were evaluated by measuring the fluorescence properties of these aromatic residues [64].

As shown in Fig. 4, the fluorescence decay populations of each fluorophore evolve differently with the temperature, indicating that internal motions of the OT/Np(7–15) peptide are modulated by the nature of P′1 residues. To estimate these parameters quantitatively, the fluorescence quenching of these peptide substrates was assessed in the presence of the same collisional quencher. Data in Fig. 5 show that the quenching of aromatic residues (free or inserted in peptides) exhibits the same linear dependence, indicating that the residues at the P′1 position have the same exposed position in the peptide structure. Since the slope of the straight line, obtained for each peptide, was smaller than that found for the free amino acid, this demonstrates that the accessibility of P′1 residues is depending upon peptide substrate dynamics which could be channelled and differently regulated by the associated proteases. The logical conclusion of these observations is that the presence of β-turn structures, in the vicinity of basic doublets, permit not only discrimination between the functional sites and those that are not cleaved [2, 43] but also modulation of the cleavage efficiency of the in vivo processed sites.

Fig. 4
figure 4

Fluorescence decay time distribution of Trp, Tyr and Phe residues inserted in the OT/Np(7–15) peptide (W13, Y13, F13), as a function of temperature. The free residues (W, Y, F) are characterized by a single population [64]

Fig. 5
figure 5

T/η dependence on the rate constant k q for collisional quenching of Trp, Tyr and Phe either inserted in the OT/Np(7–15) peptide (W13, Y13, F13) or as free residues (W, Y, F) in solutions [64]

The role of the substrate dynamics in limiting the proteolysis of precursors [64, 65] may explain the preference of some amino acid residues at specific subsites [25, 46, 66, 67]. For example, based on the data obtained for both the occurrence frequency of amino acid sequences flanking 352 dibasic moieties and the processing of the OT/Np(7–15) substrate bearing various amino acid residues at the P′1 site, we observed that most amino acid residues, occupying the P′1 position in the precursor cleavage sites, are tolerated [46]. Similarly, upstream basic residues at the −4 and/or −6 position were shown to affect the specificity of PCs and particularly of yeast Kex2 and human furin which differ in recognition of P4 and P6 residues [25, 66]. Based on data obtained with furin-like substitutions in the putative S4 and S6 subsites of Kex2, it was observed a “cross-talk” between these subsites allowing the wild type and mutant forms of Kex2 to adapt their subsites for altered modes of recognition [66]. This apparent plasticity, not seen in furin, allows the subsites to rearrange their local environment to interact with different substrates in a productive manner [66]. Moreover, coexpression in LoVo cells of integrin α4 with each PC revealed that the processing of pro-α4 is best performed by furin at the H592VISKR597↓ST site (arrow indicates the cleavage site) [67] which is different from the accepted furin processing motif R–(X) n –(K/R)–R↓ [46]. Since this processing occurs preferentially at acidic pH conditions, the presence of histidine residue at position P6 restricts the cleavage of pro-α4 by furin to acidic compartments. Finally, one of the striking features of proprotein processing is that in vivo cleavage occurs preferentially at LysArg and ArgArg doublets and more seldom at ArgLys and LysLys [26, 43]. Statistical analysis of several potential dibasic cleavage sites reveals large differences in the distribution of basic doublets when the in vivo cleaved sites were compared to those which are not cleaved [68]. Analysis of both the substrate specificity of Kex2 towards the pro-OT/Np(7–15) processing domain with altered basic pairs and the secondary structures of these substrates indicates the in vivo cleavage hierarchy of dibasic sites is encoded by both the nature of basic pairs and the plasticity of proteolytic processing domains [68].

From the enzymatic point of view, these experiments reveal that internal flexibility of peptide substrates dictates the kinetics of their hydrolysis at dibasic sites, providing a rationale for the understanding of the existence of a rather limited number of prohormone convertases [2, 43].

Conclusion

This section highlights the functional roles of β-turn structures in the proteolytic processing of prohormones. These particular secondary structures are known to be largely involved in numerous biological processes (glycosylation, phosphorylation, amidation, protein targeting to specific organelles, and cleavage of signal peptide), often being the bioactive structure that interacts with another molecule (e.g., receptors, enzymes, antibodies, etc.) within a conformational population [2, 69]. Within proteins, they tend to be more solvent exposed and therefore, as accessible sites, they are involved in molecular recognition [7072]. This tendency to be solvent exposed gives them more flexibility, a property critical in protein and peptide functions. The functional role of β-turns in molecular recognition explains the great interest in mimicking these secondary structures for the synthesis of medicines in the field of medical and pharmacological chemistry [2].

Differential processing of prohormones: model of prosomatostatin

As shown in Fig. 1, the precursors of neuropeptides as represented by procorticotropin-releasing factor and proinsulin contain a single copy of the active neuropeptides whereas certain precursors such as prothyrotropin-releasing hormone (proTRH) contain multiple related copies of the active thyrotropin-releasing hormone peptide [73]. In some cases, more than one biologically active peptide is generated from the same precursor in a tissue-specific manner. For example, the POMC precursor generates ACTH in the anterior pituitary or the hormones α-MSH (α-melanocyte-stimulating hormone) and β-endorphin in the intermediate lobe of pituitary [74]. Besides the POMC precursor, proglucagon is another archetype of multi-functional precursor which undergoes a differential proteolytic processing to generate a variety of regulatory peptides with a large palette of activities [28]. Indeed, in the intestinal L cells, maturation of proglucagon generates four different peptides (glicentin, oxyntomodulin, glucagon-like peptide-1, and glucagon-like peptide-2), which display different biological activities [28]. In contrast, only glucagon and a fragment of glucagon (miniglucagon) are produced from this precursor in the α cells of the endocrine pancreas [28, 75].

This tissue-specific processing of polyfunctional precursors is a function of both the differential distribution of PCs in subcellular compartments where the prohormone processing takes place and the differential ability of PCs to cleave the dibasic sites within the precursors. The biosynthesis of somatostatin follows these rules. Indeed, prosomatostatin is a relatively small biosynthetic precursor which undergoes monobasic and dibasic cleavages to release two functional hormones, i.e., the somatostatin-14 (S-14) and the somatostatin-28 (S-28) (Fig. 6).

Fig. 6
figure 6

Post-translational processing of prosomatostatin in either anglerfish or mammals. The S-28(1–12) amino-terminal sequences of S-28 are expanded

Prosomatostatin and its derivatives

Somatostatin is a 14-amino acid peptide widely distributed in the central nervous system and peripheral tissues in which it has diverse physiological actions. This peptide acts as a neuromodulator or a neurotransmitter in the central nervous system, as a regulatory hormone in the gastrointestinal tract and pancreas and as a release inhibitor of growth hormone and thyroid stimulating hormone in the pituitary [76]. Somatostatin exerts its biological effects through interactions with five distinct receptor subtypes that belong to the family of G-protein coupled receptors [76]. It has been demonstrated that the number and the distribution of somatostatin receptors vary from one tissue to another [76]. However, this variation is not the unique source of the diverse biological actions of somatostatin. Indeed, somatostatin is synthesized as a large precursor molecule that is proteolytically processed to generate several mature peptides.

In mammals, the prosomatostatin is cleaved predominantly at the C-terminal region to result in mainly two bioactive peptides, i.e., S-14 and S-28. The amount of these peptides has been found to vary according to the tissue origin. Indeed, S-14 is predominant in endocrine and neuronal tissues whereas S-28 is prevalent in peripheral tissues [76]. In fish, there are two separate genes encoding for two distinct precursors (Fig. 6), the prosomatostatin I which generates S-14 and the prosomatostatin II which generates anglerfish S-28 and catfish S-22 [77]. It is important to note that the S-28 encoding gene is restricted to the pancreatic islets whereas the S-14 encoding gene is widely distributed in many cells including pancreas. A novel somatostatin-like gene, called cortistatin (CST), has been recently identified in human and rat [78]. The identified cleavage sites allowed the generation of CST-17 in human and CST-14 in rat, and of CST-29, a common peptide for human and rat.

Production of S-28 and S-14 is both quantitatively and qualitatively variable from tissue to tissue and sometimes from cell type to another. Therefore, the proteolytic processing of prosomatostatin by the different PCs was evaluated [79]. Coexpression of this precursor and each PC in endocrine and non-endocrine cells showed that prosomatostatin is processed at its dibasic site by PC1/3 and PC2 and at its monobasic site more likely by PACE4 and furin [79]. In addition, by using synthetic peptides reproducing the dibasic cleavage site of prosomatostatin, an enzyme called N-arginine dibasic convertase (NRDc) has been identified and cloned [80]. Moreover, it was shown that prosomatostatin is also proteolytically processed at the amino-terminal segment by subtilase SKI-1 [81].

This differential tissue-specific processing of prosomatostatin might be controlled by two distinct proteases, each one involved in the recognition of a specific site, and/or by localization of the corresponding enzymes in different compartments of the secretory pathway [24, 26]. Alternatively, the action of processing enzymes may be directed by conformation of the processing domain which could focus the action of a protease onto or away from a particular site [2, 40].

Role of β-turn in in vivo processing of prosomatostatin at the ArgLys doublet

In the somatostatin precursor, the dodecapeptide segment S-28(1–12), corresponding to the NH2-terminal sequence of S-28, separates both basic loci (Fig. 6). Secondary structure prediction on this connecting region reveals the presence of several β-turn structures [2, 40]. To test for the importance of these secondary structures in the prosomatostatin processing, different mutants were constructed in which the sequence Pro−5-Arg-Glu-Arg−2, involved in β-turn formation [2, 40], was partially or totally substituted. Analysis of the processing efficiencies, observed with either the non-mutated precursor and prosomatostatin mutants in transfected Neuro2A cells (Table 2), indicated that substitution of Pro−5 by an α-helix promoting amino acid residue ([A−5] mutant) abolished cleavage at the dibasic site [63, 82]. In contrast, replacement of Pro−5 by a β-turn “former” residue-like ([G−5] mutant) or of the sequence Pro−5-Arg-Glu-Arg−2 by non-homologous peptide stretches ([S−5–N−3] and [Y−5–G−3] mutants), known to organize as β-turn structures in proteins, did not affect prosomatostatin processing [89].

Table 2 Effects of various mutations on human prosomatostatin processing in transfected Neuro2A cells [89]

Structural analysis of peptides reproducing these mutations supported these in vivo data. According to the classification of Woody [56, 57], the CD spectra, exhibited by the reference peptide Som(−9;+5) and its derivative [A−5]Som(−9;+5), are typical of an equilibrium between α-helix, aperiodic structures and another component of the β-turn type. However, the CD spectrum of the [A−5]Som(−9;+5) peptide shows a large variation in the ellipticity value of the band transition at 190 nm (contribution from α-helix) and its second-derivate Infra Red (IR) spectrum exhibits a significant increase of the band at 1,657 cm−1 (band generally attributed to α-helix conformation) with a concomitant decrease of other secondary structure contributions [56, 57]. In contrast, the spectral patterns of peptides [G−5]Som(−9;+5), [Y−5–G−3]Som(−9;+5) and [S−5–N−3]Som(−9;+5) indicate an increased amount of β-turn and unordered conformations in agreement with the profile of their computed second-derivate IR spectra [56, 57].

Since only replacement of Pro−5 by Ala impairs S-14 production (Table 2) in parallel with β-turn disruption, these data argue in favor of the role of β-turn in in vivo processing of prosomatostatin at the ArgLys doublet [83, 84].

Role of Pro-(Xaa)3-Pro motif in the conformation of S-28(1–12) domain

Although the prosomatostatin mutants [G−5], [S−5–N−3] and [Y−5–G−3] are cleaved at both cleavage sites, the proportions of S-14 and S-28 molecules, recovered from their cell extracts, were not similar to that found in the case of the wild type (Table 2). Given that it was shown that S-14 and S-28 are independently processed from prosomatostatin in Neuro2A [85] or in islet somatostatin tumor cells [36], these observations suggest that other structural features or specific domains are involved in the generation of equal amounts of S-28 and S-14 from their common precursor [84].

As shown in Fig. 6, the S-28(1–12) sequence includes two Pro residues known to play a special role in structures [86] and functions [87] of proteins. Moreover, these Pro residues are arranged as Pro-(Xaa)3-Pro patterns which are the most frequent motifs in proteins [86]. Therefore, several prosomatostatin mutants were constructed in which the Pro-Ala-Met-Ala-Pro motif was deleted (partially or totally) or its size varied. As shown in Table 2, large differences were observed in the processing efficiencies of these mutants. Indeed, the ratio S-28/S-14, which was 1 in cells expressing the non-mutated precursor, was raised for the [P−9, P−6] and Δ[PP] mutants. Moreover, while increase in cleavage at the monobasic site resulted in a decrease in cleavage at the dibasic site, a decrease was additionally observed in the processing efficiency of these mutants (Table 2). Structural effects induced in the conformation of these mutants were investigated by the AGADIR method [88]. As indicated in Fig. 7a, deletion of Pro residues (motif XAMAX) or shift of Pro−5 (motif PAMPA) increased both the helicity values per residue and the size of the domain containing the dibasic site, i.e., Som(Asn−6-Asn+5). Since Pro−5 is highly conserved in the primary sequence of prosomatostatin from various species, this amino acid residue additionally plays a role in the correct folding of the prosomatostatin processing domain [89]. These conclusions were supported by the results obtained for the processing of prosomatostatin mutants Δ[AMA] and [P−6 P−5]. Indeed, deletion of the tripeptide Ala-Met-Ala or the shift of Pro−9 essentially reduced the ratio of S-28/S-14 in the corresponding mutants (Table 2). Conformational analysis of these mutants by the AGADIR method reveals that the differences observed in their processing were exclusively accompanied by a decrease in the helicity values per residue of the dibasic site-containing domain (Fig. 7a).

Fig. 7
figure 7

Secondary structure prediction of the human prosomatostatin sequence (Arg−20; Asn−5). Helicity per residue calculated for peptides in which a proline residues or the A-M-A tripeptide were deleted and b Pro−5 and/or Pro−9 were mutated by Ala [89]

Final demonstration of the importance of the Pro-Ala-Met-Ala-Pro pattern was provided by examination of prosomatostatin mutants in which Pro residues were mutated. As shown in Table 2, replacement of Pro−9 or Pro−5 by Ala (α-helix promoting residue) almost abolished selectively the cleavage of the precursor at the monobasic ([A−9] mutant) or the dibasic ([A−5] mutant) sites, respectively. In contrast, substitution of both Pro residues did not impair completely cleavage of the mutant [A−9, A−5] at the monobasic site. Analysis of helicity profiles by the AGADIR method (Fig. 7b) indicated that substitution of Pro residues by Ala favored the extension of an α-helix towards the dibasic site (motifs PAMAA and AAMAA) or the monobasic site (motif AAMAP). As indicated in Table 3, the percentage values of α-helix, estimated from the CD and IR spectra of synthetic peptides corresponding to these mutations, supported the data obtained by the AGADIR method [88]. Together these results underline the respective role of each Pro residue in both the stability and the precise location of the helical structure adopted by the tripeptide Ala-Met-Ala [84]. This is consistent with the observation that deletion of the motif Pro-Ala-Met-Ala-Pro did not significantly decrease the cleavage of prosomatostatin mutant Δ[PAMAP] but rather increased the value of the S-28/S-14 ratio (Table 2).

Table 3 Amino acid sequences of prosomatostatin-related peptide substrates [89]

From the present data emerges the concept that the Pro-Ala-Met-Ala-Pro stretch is a helical-promoting seed whose integrity is essential for alternative prosomatostatin processing at both basic cleavage sites.

Functional role of S-28(1–12) domain in the processing of prosomatostatin

Despite the precedent data emphasizing the functional role of the Pro-Ala-Met-Ala-Pro motif in differential processing of human prosomatostatin, its sole presence cannot explain the post-translational processing of prosomatostatin in other species [84, 89]. Indeed, the mature S-14 derives from the anglerfish prosomatostatin I (Fig. 6) which shares the pattern Pro-Ala-Met-Ala-Pro with the human precursor. Similarly, the S-28 hormone is released from the anglerfish precursor II (Fig. 6) which contains the Pro-Pro motif as in the human prosomatostatin mutant [P−6, P−5] (Table 2). To define the structural basis responsible for these differences, the secondary structure of the Som(Xaa−20-Asn+5) sequence was explored in each species by the AGADIR method.

Analysis of data in Fig. 8a indicates that the source of these differences resides essentially in the helicity values per residues of the monobasic site-containing domain, i.e., sequence Som(−20;−10). Based on the data obtained for the human prosomatostatin mutants (Table 2), these results argue in favor of a functional relationship between the helicity ratio Rh [Rh = total helicity values of fragment (−20;−10)/total helicity values of fragment (−6;+5)] and the proportions of S-14 and S-28 generated from each somatostatin precursor. Indeed, the value of this parameter was 1.3, 3.9, and 0.1 for the human, anglerfish I, and anglerfish II precursors, respectively (values in the inset of Fig. 8a). This implies that other domain(s) also participate in the differential processing of those prohormone molecules. Such an interpretation is consistent with the observation that transfer of the S-28(1–5) sequence (Fig. 6) from one species to another allows each prosomatostatin species to mimic the other (Fig. 8b–d).

Fig. 8
figure 8

Secondary structure prediction of the prosomatostatin sequence (−20;+5) in either anglerfish or mammals. Helicity per residue calculated for the (−20;+5) sequence derived from a the human, anglerfish I, and anglerfish II prosomatostatin, b the human prosomatostatin in which the S-28(1–5) sequence was replaced by either the AASGG (anglerfish I) or the SVDST (anglerfish II) sequences, c the anglerfish I prosomatostatin in which the S-28(1–5) sequence was mutated by either the SANSN (human) or the SVDST (anglerfish II) sequences, and d the anglerfish II prosomatostatin in which the sequence S-28(1–5) was replaced by either the AASGG (anglerfish I) or the SANSN (human) sequences. The values calculated for the parameter Rh are indicated in the inset [89]

In light of these results, the structural features involved in the correct processing of prosomatostatin precursors reside necessarily in the S-28(1–12) segment [84, 89].

Conclusion

The post-translational processing represents an essential mechanism that leads from a single precursor to a series of regulatory molecules with a large palette of activities through the action of prohormone convertases. This is illustrated here by the study of prosomatostatin which underlines the functional role of the S-28(1–12) sequence in the control of the generation rate of bioactive molecules according to selective differential metabolic pathways. In higher species such as mammals, the function of the connecting peptide S-28(1–12) is accomplished by the Pro-Ala-Met-Ala-Pro motif which contributes in maintaining an adequate conformation recognized by the specific prohormone convertases to generate the normal S-28/S-14 ratio. In contrast, in lower organisms such as anglerfish, the use of two separate precursors represents a less developed mechanism in which the release of each somatostatin molecule is essentially under the control of the structure of the monobasic site-containing domain.

Functional roles of protein convertases in health and diseases

Regulation of proneuropeptides/processing enzymes is an essential and common process by which cells generate more effective processing of prohormones and propeptides into mature molecules [2429]. Indeed, the PC family plays a crucial role in a variety of physiological processes like embryonic development and neural function by cleaving many functionally important cellular proteins including hormones, neuropeptides, growth factors, metalloproteinases, and signalling receptors into their respective mature forms [2429]. The physiological role of PCs has been examined using knockout mice, and it was observed that the disruption of the expression of their genes results in many in vivo defects such as abnormal embryonic development, hormonal disorder, infertility, and/or modified lipid/sterol metabolism [3842]. For example, the absence or dysfunction of furin and PC5/6 is lethal at early embryonic stages [90, 91]. This is likely due to the absence of processing of several molecules (e.g., members of the transforming growth factor β family) reported to play crucial roles during the developmental stage. In contrast, knockout mice of PC1/3 and PC2 genes are viable but the manifestation of hormonal and/or neuroendocrine deficiency was observed [39, 40]. Indeed, alterations in the expression of PC1/3 and PC2 result in profound effects on neuropeptide homeostasis because PC1/3 and PC2 are essential for the processing of a variety of proneuropeptides such as proenkephalin, prosomatostatin, proneurotensin, proproneuropeptide Y, and POMC [28, 29, 3537]. Moreover, PC4 null mice are infertile because this protease is involved in processing of precursor proteins required for normal fertility [40, 92]. Among the many PC substrates expressed in testicular germ cells, some such as fertilins, insulin-like growth factor-1, and transforming growth factor β were shown to be important in reproduction.

Proprotein convertase signalling pathways are strictly regulated, and therefore the deregulation of their activity can lead to various pathologies such as neurological disorders, cancer, viral infections and bacterial pathogenesis, diabetes, and atherosclerosis [27, 38, 39, 9396]. For example, the PCs have been linked to Alzheimer’s disease through the zymogen activation of certain proteases (α- and β-secretases) implicated in the processing of the amyloid protein precursor (APP). APP is proteolytically processed by α-, β-, and γ-secretases via two distinct processing pathways [95]: the major physiological route of APP processing involving the protease α-secretase, which cleaves APP within its amyloid domain (Aβ) to generate non-toxic fragments and the amyloidogenic pathway in which the β- and γ-secretases are the major protagonists in the generation of neurotoxic Aβ from APP. It has been shown that cleavage of pro-BACE (precursor of β-secretase) by furin and other PCs increase APP processing [97, 98]. Likewise, overexpression of PC7 or the NRDc decreases Aβ production by enhancing the α-secretase cleavage of APP through activation of the disintegrin metalloproteases [99101]. Therefore, activation of β-secretase (overexpression of furin) or inactivation of α-secretase (inhibition of PC7) enhances the production of amyloidogenic peptides [95].

The involvement of PCs in tumorigenesis was deduced from the localization of some PCs (Furin, PACE4, PC5, and PC7) in several different tissues and epithelial or nervous system tumors [38, 39, 93]. Overexpression of these PCs enhanced tumorigenesis and aggressiveness of tumor cells via augmented processing and activation of various molecules involved in tumorigenesis and metastasis [93, 102]. These include growth factors, growth factor receptors, adhesion molecules, and metalloproteases that are substrates of PCs [103, 104]. Inhibition of PC activity in various tumor cells resulted in reduced processing of these cancer-associated substrates [105, 106]. Hence, the PCs influence tumor cell proliferation, motility, adhesiveness, and invasiveness by controlling the maturation/activation of the key cancer-related proteins.

In the case of viral and pathogen infections, a variety of both bacteria (e.g., diphtheria, botulinum and anthrax toxins) and viruses (e.g., HIV-1, flaviviruses, Marburg and Ebola virus) exploits host PCs to allow entry into host cells and to cause disease onset [96]. This process occurs through the activation of their toxins and viral proteins by PCs in order to become fully functional [107109]. Indeed, Inhibition of processing of these viral proteins by PC inhibitors completely abrogated the induced cellular cytopathicity [27, 110112]. Then, the infectious capacity of viruses and bacteria requires the presence of host PCs to process their glycoproteins and toxins that are produced as inactive and unprocessed forms.

From a medical and biotechnological viewpoint, the protein convertases constitute potential drug targets to control the production of peptides involved in these diseases [27, 38, 39, 110112].

Protein convertases as potential therapeutic targets

The features of the primary structure of yeast kexin, human furin, and other human PCs are remarkably conserved. All PCs contain a subtilisin-related catalytic domain, a conserved P-domain, and a variable domain, which in some PCs is followed by an additional C-terminal trans-membrane domain and a short cytoplasmic domain [27]. Moreover, the spatial arrangement of the catalytic and P domains of soluble forms of mouse furin and yeast kexin has been elucidated from their crystal structures [113115]. Based on these experimental data, models of the other PCs were generated by homology modeling techniques [30, 115] in order to derive the structural determinants that may help to explain their stringent substrate-specificities [2429, 4354, 6668].

According to topology and structure-based sequence comparisons, this study showed that all PCs exhibit a significantly higher similarity to furin than to kexin, with PC4, PACE4, and PC5/6 being more similar to furin whereas PC1/3, PC2, and PC7 are less similar to furin. This order of similarity is also valid for the substrate-binding domain, which exhibits several negative amino acid residues that allow PCs to recognize and to process substrates at multiple basic residues. Indeed, furin possesses the highest number of negative charges (16 acidic residues), PC4, PACE4, and PC5/6 exhibit the same number of negative charge (15), PC1/3 and PC7 resemble kexin with 13 acidic residues, while PC2 displays the lowest number of acidic residues (11). These findings indicate that the interplay between the number of negatively charged residues in the active site of PCs and the number of basic substrate groups dictates their preference for distinct substrates. In addition to differences in the total negative charges of their catalytic domain, the PCs also differ by the geometry and the charge distribution of their substrate binding regions. For example, all PCs share a virtually identical S1 pocket, which exclusively accommodates a P1 Arg residue. The geometry of the S2 subsites allow the PCs to prefer a P2 Lys residue, but also to accommodate other amino acid residues as in furin. For all PCs (except PC1/3 and PC2), the architecture of the S4 subsites (the characteristic of furin) prefers basic residues at P4. The less stringent requirement of PC1 and PC2 for the basic residues might result from the lower negative charge accumulation near their S4 sites or be due to slightly modified S4 sites. Likewise, furin, PC4, PACE4, and PC5 seem to favor the binding of additional basic residues at P3, P5, P6, and/or P7 whereas PC1, PC2, and kexin accept other residues. Moreover, all PCs should prefer polar/acidic residues at P10 and hydrophobic residues at P20, disfavoring basic residues at these subsites. In agreement with many experimental results [27, 38, 39, 110112], these results indicate that the preference of PCs for basic residues at P1, P2 and beyond parallels the increased number of negative charges in or around their substrate-binding subsites.

Consequently, since the number and the distribution of negatively charged residues in the active site of PCs seems to be important for their stringent substrate-specificity, design of inhibitors with variable lengths and containing basic residues at essential subsites should discriminate between the PC family members. Indeed, d-poly-Arg peptides have been shown to be potent and relatively selective inhibitors of furin, and their inhibitory potency has been found to be proportional to their length [116, 117]. Moreover, some of these d-Arg oligopeptides were able to block the lethal effects of Pseudomonas aeruginosa exotoxin or to suppress the infection of HIV1 infection in vivo. In addition to the structural and molecular modelling approach, other approaches were utilized for the development of potent and specific PC inhibitors [115]. For example, the endogenous approach used several naturally occurring sequences, known to inhibit PCs in vivo, such as the prodomains of the proteases themselves, the neuroendocrine proteins 7B2, or the proSAAS precursor [118]. Moreover, the chemical approach used non-peptidyl compounds that are ligands of Zn2+ and Cu2+ ions [119]. Because the PCs have the ability to compensate for each other, these inhibitors are not specific for one PC and therefore may affect multiple cellular functions and not only the target processes.

For these reasons, the cleavage preferences of PCs were evaluated by analyzing the relative efficiency of furin, PC2, PC4, PC5/6, PC7, and PACE4 in cleaving over 100 decapeptide sequences representing the Arg-Xaa-Xaa-Arg↓ motifs of human, bacterial, and viral proteins [120]. From this comparative study, it was shown that the cleavage preferences of PCs can be divided into three groups: {furin}, {PACE4, PC4, PC5/6, PC7}, {PC2}. Since the PCs differ by their activation pathway [26], this observation allowed the drawing of conclusions concerning the relative significance of each PC in the processing of the individual proteins of human and pathogen origin. For example, it was shown that PC5/6, PC7, and PACE4 significantly contribute to the processing of BACE1. Similarly, PC2 appears to contribute significantly to the processing of HIV-1 gp160 and the fusion protein precursor of parainfluenza whereas it is extremely unlikely that this endoprotease can play a significant role in the processing of Notch and Ebola virus glycoproteins. In short, this study demonstrates that the knowledge of the contributions of PCs to the proteolytic processing of normal proteins and viral and bacterial pathogens is a prerequisite to select which PCs are promising drug targets in infectious diseases.

Concluding remarks

The secretory pathway in cells possesses an elaborate set of endoproteolytic enzymes (PCs) that carry out a crucial step in protein precursor maturation. This step is proteolytic processing of various polypeptide precursors by cleavage at the peptide bond C-terminal of the consensus pattern Arg-Xaa-Xaa-Arg. Members of the PC family (furin, PC1/3, PC2, PC4, PC5/6, PC7, and PACE4) play this central role in a variety of physiological processes by generating bioactive peptides (e.g., hormones and neuropeptides) or activating growth and differentiation factors, matrix and plasma proteins, enzymes, and receptors that are implicated in many important physiological events. Moreover, structural and homology-modeling studies demonstrate more similarity than expected at the catalytic site of the seven PCs. The major conclusion highlighted in this review is that specific PC-substrate pairs do exist, but that there is substantial redundancy for the majority of substrates which may be cell type- and even species-dependent. So, several studies have been undertaken to elucidate the relationship(s) between the sequence, the structure, and the cleavage of maturation sites of peptide and protein precursors. Indeed, secondary structure analysis of sequences, exhibiting potential dibasic sites, revealed the presence of turn structures in the vicinity of in vivo cleavage sites. It is noteworthy that these particular structures are solvent exposed, and hence, as accessible sites, they are involved in molecular recognition of substrates by PCs. Moreover, the tendency to be solvent exposed confers on the turn structures more flexibility (plasticity), a property providing to multibasic cleavage sites the capacity to rearrange the local environment of their subsites for altered modes of recognition by PCs. Moreover, various bioinformatics models were developed to predict precursor cleavage sites based on the type and physiochemical properties of amino acid residues at precursor sequence locations proximal to cleavage. All these studies conclude that the prediction of precursor cleavage sites is taxa-dependent and that the accurate knowledge of peptide processing requires the simultaneous consideration of precursor families, species, and predictive approaches. These observations make the development of selective drugs to target individual PCs discouraging, especially because numerous studies showed that PCs are crucial for the initiation and progress of many important diseases, most prominently in several viral infections and cancers.

In conclusion, the prerequisite to a rational engineering and design of inhibitors of PCs requires the characterization of their respective contributions to the processing of polypeptides involved in various diseases. Then, the appropriate drug compounds should maximize the inhibition of the activation of disease-associated substrates with a minimal interference in the normal physiological processes.