Introduction

Molecular virulence of some bacteria and parasites can survive and reproduce inside phagocytic cells. Cell surface proteins, toxic proteins, and hydrolytic enzymes are some of the common virulence factors produced by the pathogenic bacteria. Microbial infections depend on a combination of virulence factors, the immune status of the host, and the innate resistance of the host. Molecular virulence of any pathogenic bacteria is usually determined by the proteins in nature (Henderson and Nataro 2001). Proteinase is a hydrolytic enzyme that breaks down peptide bonds in proteins and peptides. It can be classified into 8 main clans based on the catalytic residues (Rawlings and Barrett 1993; Rawlings et al. 2004, 2006) and further grouped into 225 families with 71 peptide inhibitors (Rawlings et al. 2012).

A peptidase with the unknown active site and the catalytic site is grouped as an unassigned peptidase clan (Hicks et al. 2001). A clan is a group of families that are thought to share a common ancestor. According to the MEROPS 9.5, 8 peptidase families (U32, U40, U49, U56, U57, U62, U69, and U72) belong to unassigned peptidase clan. These family members are widely distributed in diverse microbial pathogens (Rawlings 2010, Rawlings et al. 2012) and play important roles in the microbe-microbe and microbe-host interactions. Some members of this clan allow a pathogen to evade the host’s immune response, and degrade some extracellular components in the host tissues (Popoff and Bouvet 2009). Consequently, a redox potential has been lowered in the infected tissues due to the buildup of metabolites from protein degradation (Mittal et al. 2014). It ensures a competitive advantage to the pathogens to possibly implicate for microbial infections and diseases. However, it is still debated to evaluate potential associations between unassigned peptidases and microbial virulence in contrast to the mammalian counterparts.

To date, crystallographic structures have been solved only for the Thermotoga maritima DNA gyrase, a member of the unassigned peptidase U62 family (Rife et al. 2005). The specificity, structures, and biochemical functions of microbial unassigned peptidases are not yet known in detail compared to those found in humans (Duarte et al. 2016). Most recently, apparent involvements of the members of this clan were extensively analyzed to understand the microbial pathogenesis and virulence mechanisms based on the evolutionary aspects of their structures and functions (Sharov 2014; Chellapandi and Prisilla 2017).

Several evolutionary constraints have been identified from the structures and functions of some virulence proteins or protein families (Chellapandi et al. 2013, 2018, 2019; Prathiviraj et al. 2016; Prisilla et al. 2016; 2017). A common functional core is conserved and substantially established across unassigned peptidases of microbial pathogens by evolutionary pressure (Barrett and Rawlings 2007). In this study, evolutionary genetic analysis was carried out to understand the molecular involvement of unassigned peptidases in the microbial virulence and pathogenesis. Current bioinformatics resources can ensure our knowledge of explicating the molecular hypothesis of their structures, functions, and evolution behind the development of human therapeutics.

Materials and methods

Dataset

The unassigned peptidase’s sequences were retrieved from the UniProt database (http://www.uniprot.org/). These sequences were used as templates for the identification of respective similarity hits by searching against microbial pathogens using the National Center for Biotechnology Information-Delta-Blast program (Boratyn et al. 2012). A topology of conserved domains in each target peptidase was identified with the Simple Modular Architecture Research Tool 8.0 server (Letunic and Bork 2018). The sequences with low sequence similarity, and partially conserved domain removed from the dataset (Table S1).

Evolutionary genetic analysis

Multiple sequence alignment was separately carried out for each family by ClustalX 2.0 software (Thompson et al. 1997). The aligned sequences were inspected manually and refined to obtain a significant alignment. Homogeneous patterns among all lineages were searched with a neighbor-joining algorithm using the Molecular Evolutionary Genetics Analysis (MEGA) X software (Kumar et al. 2018). The final phylogenetic tree was manually corrected and constructed with 1000 bootstrapping replicates. A supertree was constructed for the overall clan members with the COBALT server using a Fast Minimum Evolution algorithm (Papadopoulos and Agarwala 2007). Phylogenetic divergence across each family was calculated from corresponding trees by SplitsTree 4.0 software (Huson and Bryant 2006) using the BioNJ algorithm. Evolutionary genetic analyses were performed with MEGA software. Recombination frequency and population-scaled mutation rate were calculated by Recombination Detection Program 3.0 software using the Recomb2007 method (Martin et al. 2010). Coefficients for the type-I (θI) and type-II (θII) functional divergence were examined with DIVERGE 3.0 software (Gu et al. 2013) using the Kimura model with 100 bootstraps. Synonymous and non-synonymous substitution sites subject to the Darwinian selection were calculated by a selective strength (Ka/Ks ratio, ω) using the Ka/Ks calculator 2.0 (Wang et al. 2010) and HyPhy 2.2.1 program (Pond et al. 2005). The evolution rate of each family was calculated by Rate4Site 2.01 software (Mayrose et al. 2004). Evolutionary patterns were inferred between two similarity sequences of microbial pathogens by EMBOSS ALIGN (http://www.ebi.ac.uk/Tools/psa/emboss_needle/) using different amino acid substitution matrices.

Structural and functional analysis

The structural homologs were searched in the protein databank with the Position-Specific Iterated-Basic Local Alignment Search Tool (Altschul et al. 1997) and then used as structural templates for the prediction of 3D structures from the target sequences using the SWISS-MODEL (Biasini et al. 2014). The predicted structures were analyzed for their quality and accuracy by Structural Analysis and Verification Server (https://services.mbi.ucla.edu/SAVES/). The protein folding rates of all structural classes were predicted from the target sequences by the FoldRate server using a multiple regression program (Gromiha et al. 2006). The sequentially conserved motifs and regular expression patterns across unassigned peptidase families were discovered by MEME Suite 5.1.1 (Bailey et al. 2015). The PROSITE signature matches and ProRule-associated functional and structural residues were detected using the ScanProsite (de Castro et al. 2006).

Results

The members of this clan are found in diverse pathogenic bacteria, and many of them belong to the Enterobacteriaceae family (Table 1). Peptidases U32 and U62 are immensely dispersed in microbes compared to other families. Peptidases U32, U57, and U69 are mainly present in the pathogenic bacteria, particularly Mycobacterium tuberculosis. The phylogenetic trees were constructed for inferring the origin and evolution of microbial unassigned peptidases (Fig. 1 and Fig.S1). Each family of this clan is formed eight major groups separately, and some members from peptidases U57, U49, U69, and U62 are distinct from the respective clusters. It revealed that the conserved domains of such families slightly diverged, despite entire protein sequences. Some members in the peptidase U49 are diverged from the members of peptidase U49 and clustered with few members in the peptidase U69 (Salmonella enterica). Peptidase U32 is clustered separately within microbial pathogens.

Fig. 1
figure 1

Dendrogram for unassigned peptidase clan in the microbial pathogens, reconstructed by Fast Minimum Evolution algorithm in the COBALT server using closely related protein sequences obtained from diverse microbial pathogens. Datasets used for the phylogenetic tree construction and trees of individual families are available in the Supplementary file. The circular view of all microbial unassigned peptidases is represented in Fig. 1a. A condensed view of respective unassigned peptidase families is represented in Fig. 1b

Table 1 Description of unassigned peptidase clan in the microbial pathogens

Estimates of genetic diversity of this clan implied that the overall transition/transversion ratio is ranged from 0.60 to 1.28, in which peptidase U57 is highly diverged by a radical nucleotide substitution (Table 2). Evolution rates (α) of peptidases U49 and U69 are relatively higher than other families, reflecting that both families may undergo rate variation among sites. Estimates of phylogenetic distance and invariant sites are closely related to one another within the family. The phylogenetic diversity of peptidases U32 and U62 is expanded with diverged functions (Fig. S2).

Table 2 Estimates of genetic diversity and Darwinian selection for unassigned peptidase clan in the microbial pathogens

Estimates of population-scaled recombination pinpointed that nucleotide/amino acid diversity of peptidase U57 has expanded more than other families. Duplication/shuffling events show to impact on the functional divergence of peptidases U62 and U72. Peptidases U49, U62, and U69 are diverged for functional selection by imposing the Darwinian positive selection (Table 2). It indicated that some members of this clan widely diverge in the microbial pathogens at the species level and most of them are retained by neutral evolution within closely related species (Table S2). Conserved domains of this clan are not separately diverged but a new function might have evolved at the same evolution rate in the microbial pathogens.

Besides, recombination events were observed as evolutionary constraints for the evolution of new function or to maintain existing function in the microbial pathogenic lineage. There are no recombination events detected in the peptidases U56 and U69. The recombination/mutability rate is notably higher in the peptidases U72 and U49 compared to others (Tables S3-S4). Peptidase U62 is less conserved as the result of frequent substitution mutations (Fig. S3-S4). The primary coevolved residues are contributed to the functional evolution of the members of this clan during the coevolution process (Table S5).

Peptidase U62 contains a PmbA-TldD domain expected to undergo type-I divergence by fixing the function in both groups but variable in another group (Table 3). A conserved domain across peptidase U72 undergoes type-II divergence. The rate of gene duplication events across peptidases U62 and U72 imposed by type-I divergence is faintly higher than those events that have undergone type-II divergence. An extensive analysis of common motifs provided a clue to understanding the catalytic function and substrate-binding specificity of each family in this clan (Fig. 2). The results of our study revealed that the peptidase motif is a common conserved functional core subjected to diverge for substrate recognition.

Fig. 2
figure 2

WebLogo representations of the conserved functional motifs identified across the unassigned peptidase clan in the microbial pathogens. The residue frequencies are represented by their relative height and the site-specific probabilities as total column height, reflecting informative value for conservation and function

Table 3 Coefficients of functional divergence between homologous clusters of unassigned peptidase clan in the microbial pathogens, estimated by DIVERGE. The details of microorganisms in each cluster are available in Supplementary file

Homology models were generated for understudied target peptidases as shown in Fig. 3a. It has been shown that all modeled proteins have exhibited a 21–61% sequence identity with dissimilar molecular functions (Table 4). All modeled proteins were validated for their structural quality and accuracy using the Ramachandran plot. It shows that modeled residues of each target proteins are in the allowed region and few residues are in disallowed regions. We predicted above 90% residues are in allowed regions in the plot. However, structural motifs in the peptidases are analogs to those present in the structural templates. It pointed out that a conserved fold and spatial structural arrangement might have evolved separately at different evolution rates. Results of fold rate prediction show the structures of peptidases U32 and U49 consisted of all-alpha classes with the fast-folding rate (Fig. 3b). Peptidase U56 structure composes unknown structural classes at different folding rates. Peptidase U57 structure has a mixed structural class with a slow folding rate. The structures of the peptidases U62, U69, and U72 have mixed structural classes with a fast-folding rate. The results of this study are considerably better to understand the entanglement of folding imprints in the arrangement of structural elements and associated molecular functions.

Fig. 3
figure 3

Graphical representation of protein models (a) and protein folding imprints (b) of unassigned peptidase clan in the microbial pathogens based on their fold rates

Table 4 Prediction of tertiary structures and validation of the target protein sequences from unassigned peptidase clan in the microbial pathogens using homology modeling

Discussion

Unassigned peptidase U32 family

Peptidase U32 degrades soluble fibrillar type I collagen and does not have a zinc-binding motif similar to bacterial collagenases belonging to the peptidase M9 family (Kato et al. 1992). Peptidase U32 is essential for initial penetration of the host, facilitating the prompt establishment of infection with the concomitant progression of chronic periodontitis (Han et al. 2008; Grenier and La 2011; Figaj et al. 2019). Collagenolysis mechanism of this family remains speculative due to the existence of different functional regions and domain organization. In this study, peptidase U62 from C. botulinum type A was closely related to the C. botulinum neurotoxins peptidase, as described earlier (Doxey et al. 2008). It is also found that the existence of invariant sites in different functional regions and domains has influenced the phylogenetic divergence and functional expansion of this family within microbial pathogens. Microbial peptidase U32 might have evolved by gene duplication and divergence, as previously hypothesized for clostridial collagenases (Bond and Wart 1984; Matsushita et al. 1999). The members of peptidase U32 had a distinct structure and common functional analogy to the collagenases as a result of the functional convergence (Galperin et al. 2012).

Unassigned peptidase U49 family

Lit is a constitutively expressed peptidase that hydrolyzes a single peptide bond within the universally conserved switch region (R58GV-ITI motif) of the host translation factor EF-Tu1. A putative Zinc-binding catalytic motif H160EXXHX67H232 and His169 are important for Lit peptidase activity that mediates exclusion in Escherichia coli K-12 (Copeland and Kleanthous 2005). EF-Tu1 forms a weak complex with a major capsid protein gp23 (Gol peptide) that serves as a signal for viral infection, ultimately leading to the arrest of protein synthesis and cell death before phage maturation (Georgiou et al. 1998; Bingham et al. 2000). In our study, Lit peptidase shared very little global similarity with other peptidases. Even if most of the family members in this family are related, few of them were diverged from the same family and clustered to the peptidase U69. It revealed that functional purification of this family in the microbial pathogens can be imposed by strain-specific neutral evolution at a slow non-synonymous substitution rate.

Unassigned peptidase U56 family

Homomultimeric peptidase (linocin M18) is an antilisterial bacteriocin that belongs to peptidase U56, which can hydrolyze chymotrypsin, trypsin, and casein. There is no amino acid sequence homology to any other peptidases (Boucabeille et al. 1997). Linocin M18 from T. maritima is a perspective for its antimicrobial ecological strategies in a hyperthermophilic niche (Hicks et al. 1998, 2001). As shown by our analysis, the members of this family have evolved by several evolutionary forces. We observed the coevolution process as an evolutionary force to bring functional divergence across the peptidase U56 in the microbial pathogens. It was agreed to the previous hypothesis on the coevolution of functionally constrained characters described by Wagner (1984).

Unassigned peptidase U57 family

A sporulation-specific peptidase (YabG) has a relatively complex role for proteolysis in the maturation of spore cortex and coat assembly proteins such as cotF, cotT, yrbA, spoIVA, yeeK, and yxeE (Takamatsu et al. 2000a, b). The protein modification process involved in the spore germination is mediated by comprehensive interactions among yabG, Tgl, and their substrates in the mature spores (Henriques and Moran 2000; Kuwana et al. 2006). Our study described that the evolution of such a sporulation-specific family is quite complex encompassing multiple tandem duplications (Galperin et al. 2012). We observed phylogenetic proximity between peptidase U57 and peptidase U72, suggestive of the inter-transitory evolutionary ancestry of these families. Bacillus cytotoxicus sporulation-specific peptidase has shown to diverge from the same family as a result of nucleotide/amino acid diversity during the evolution process. Interestingly, the members of this family were remarkably diverged compared to others due to amino acid substitutions and transition/transversion processes. Moreover, the protein structure of the members of this family might have evolved with mixed structural classes with a slow folding rate.

Unassigned peptidase U62 family

Microcin B17 is a ribosomally synthesized peptide produced by diverse strains of Gram-negative bacteria carrying a pMCCB17 plasmid (Allali et al. 2002). Microcin-processing peptidase 1 cleaves the microcin B17 precursor at the Gly26-Val27 in the peptide clan. Consequently, it participates in the process of controlling DNA gyrase activity by controlling cell death (ccdA) of sex factor F, which ultimately leads to cell death (Murayama et al. 1996). Phylogenetic inference of our study indicated that the members of peptidase U62 were evolutionarily distinct from distantly related organisms. A conserved domain PmbA_TldD diverged within the family members, which could be resulted due to the type-I divergence. Results of this study also suggested that the members in the peptidase U62 might have evolved via recombination process and amino acid substitutions in accordance to the earlier work (Bedau and Packard 2003; Clark et al. 2012).

Unassigned peptidase U69 family

Auto-transporters are secreted proteins that are assembled in the outer membrane of bacterial cells. AIDA-I self-cleaving autotransporter protein serves as a virulence protein in different toxigenic E. coli strains (Henderson and Nataro 2001). During the proteolytic cleavage, its processed mature passenger domain is stabilized by non-covalent interaction with the 30 kDa β-domain. It has been reported to involve in the bacterial pathogenesis (Benz and Schmidt 1992, 1993). As a result of our study, the members of this family might undergo a rate variation among sites during the evolutionary process since the evolution rate was relatively different from others (Bedau and Packard 2003). There were no recombination events and low evolutionary change for functional divergence. Interestingly, a selective strength for protein function was rather than gene function. Some members of this family might have evolved with a specific and distinct function, which is adaptive to the particular niche occupied by the microbial pathogens. As shown by our analysis, structural domains of this family member could be evolutionarily optimized during the colonization of new niches owing to the overall passenger domain scaffold being widely divergent (Kostakioti and Stathopoulos 2006; Celik et al. 2012).

Unassigned peptidase U72 family

Deamidase of Pup (prokaryotic ubiquitin-like protein) (Dop) converts the C-terminal glutamine of Pup to glutamate, which forms an isopeptide bond with an ε-amino group of lysine residues of target proteins by the enzyme PafA (proteasome accessory factor A) (Özcelik et al. 2012; Ofer et al. 2013; Prathiviraj and Chellapandi 2020). It is an evolutionary derivative of glutamine synthetases consisting of GhExE as a distinctive signature (Iyer et al. 2009). Pup depupylase/deamidase belongs to the carboxylate-amine ligase family (Barandun et al. 2012). Dop and PafA are similar in overall structure and fold with differences in loop regions. The active site is located in a broad β-sheet cradle accessible at one end (Özcelik et al. 2012). Pup-proteasome system is evaluated as a promising drug target for emerging multi-drug-resistant M. tuberculosis strains (Ofer et al. 2013; Prathiviraj and Chellapandi 2020). In the present study, the Pup protein has a conserved motif with a G[EQ] signature at the C-terminus. It was structurally unrelated to the ubiquitin fold, and the function of the protein modifier was converged. It also found that a phyletic pattern of mycobacterial PafA was closely related to the Pup and both of these proteins were genomic neighbors in all bacterial lineages. We observed a functional linkage between Pup/PafA and the archaeal-type proteasome (Pearce et al. 2008). The rate of gene duplication events across the peptidase U72 by type-I divergence was faintly more than those resulted from type-II divergence. A new molecular function of this family member might have evolved due to radical changes in the segregating sites with a high recombination rate, as described by Clark et al. (2012).

Conclusions

We conducted a detailed proteomic survey to understand molecular and functional diversity of unassigned peptidase clan in the microbial pathogens. The members of this clan are evolutionarily related, but the sequence and functional diversity are specific to the individual strains. A conserved domain (peptidase) in the respective families is conserved across the genera and evolved as distinct substrate specificity for harboring the full virulence of microbial pathogens. Genetic drift, amino acid substitutions, and coevolution are major evolutionary constraints imposing on the functional divergence of this clan in the microbial pathogens occupying a different pathological niche. The Darwinian positive selection is an important evolutionary process that determines the neofunctionalization of highly diverged family members from a conserved functional core of this clan. The present study describes the apparent involvement and associations of members of this clan in the molecular pathogenesis and virulence of clinically important pathogens. As a result, the evolution of history and origin of this clan provides a new idea for the development of antimicrobial agents targeting the understudied proteins. The targeted proteins of this study are promising targets for antibiotic drug discovery and the development of resistance for many classes of clinically used antibiotics. Nevertheless, experimental evaluations of the target proteins are very important to know their structural, functional, and biochemical characteristics during the infection process.