figure b

Introduction

Proteolysis is a post-translational modification that is widespread among cell surface membrane proteins, regulating their localization, lifetimes, and biological activities (Puente et al. 2003; Rawlings et al. 2017). Proteolytic proteoforms arise at the cell surface as a result of numerous biological processes (Fig. 1). Many plasma membrane proteins are cleaved by signal peptidase during their translocation to the membrane (Loureiro et al. 2006), while other membrane proteins are activated by protease cleavage in compartments of the secretory pathway prior to insertion in the plasma membrane (Tian et al. 2011). After insertion into the plasma membrane, cell surface proteins may undergo cleavages that regulate their activity, including ectodomain shedding, a proteolytic process that releases all or part of a protein’s extracellular domain from the cell (Lichtenthaler et al. 2018). Ectodomain shedding may activate cell surface proteins by releasing biologically active ectodomains such as cytokines (Black et al. 1997; Moss et al. 1997) or growth factors (Blobel 2005) or may inhibit protein activity if the full-length protein is required for function, as is the case for many cell surface receptors. Although cell surface proteolysis was once believed to occur in only a handful of proteins, it is now appreciated that this process is fundamental to biological regulation, controlling how cells communicate with one another and how they respond to environmental cues (Lichtenthaler et al. 2018).

Fig. 1
figure 1

Proteolytic proteoforms at the cell surface. Proteolytic proteoforms at the cell surface arise as a result of receptor ectodomain shedding (top left), release of soluble signals such as growth factors and cytokines (top right), cleavage by signal peptidase during protein translocation (bottom left), and regulatory cleavage events

Our understanding of cell surface proteolysis has largely been shaped by the tools available for its detection. Although they are essential for many biological processes, functionally important proteolytic cleavage events are often undetectable by conventional proteomics approaches, which primarily measure changes in protein abundance. Here, I provide a perspective on the field of proteomics-based protease substrate identification and the application of techniques and methods in this area to probe proteolysis at the cell surface. I particularly focus on subtiligase-TM (Weeks et al. 2021), a transformative new chemoenzymatic tool designed for the challenging task of identifying proteolytic cleavage sites at the cell surface with single amino acid resolution.

Techniques for Global Identification of Protease Substrates

Over the past two decades, the pace of protease substrate discovery has accelerated based on the development of numerous methods for identification of proteolytic cleavage events using mass spectrometry (MS)-based proteomics. One set of techniques, exemplified by the Protein Topography and Migration Analysis Platform (PROTOMAP) (Dix et al. 2008; Simon et al. 2009), provides a means of identifying protease substrates and the extent to which they are cleaved. This method relies on changes in migration of proteolytically cleaved proteins during gel electrophoresis, enabling separation of cleaved and uncleaved proteoforms prior to MS analysis. PROTOMAP has the advantage that it can provide information about the abundance of proteolytic proteoforms, which can be informative for interpreting the potential biological significance of cleavage events that are hypothesized to produce loss-of-function phenotypes. However, PROTOMAP is poorly suited for identification of exact sites of proteolytic cleavage.

Another class of protease substrate identification strategies, collectively known as N terminomics methods, enables global profiling of protease substrates and the exact sites of proteolytic cleavage. In N terminomics methods, cellular N termini are isolated by either selective modification of protein N termini or by depletion of internal peptides formed after trypsin digestion of the sample. In methods based on depletion of internal peptides, all amines in a proteome of interest, including both protein N termini and lysine side chains, are blocked prior to trypsin digestion. In Combined Fractional Diagonal Chromatography (COFRADIC) (Gevaert et al. 2003; Damme et al. 2005), the sample is then digested with trypsin and the newly formed N termini are modified with a chemically distinct functional group, enabling chromatographic separation of internal peptides prior to MS analysis. A related but alternative strategy, Charge-based Fractional Diagonal Chromatography (ChaFRADIC) relies on chromatographic separation of internal peptides based on their distinct charge states (Venne et al. 2013). ChaFRADIC has been extended to a pipette tip-based method, ChaFRAtip, in which chromatography can be performed inside a pipette tip, enabling enrichment of thousands of N-terminal peptides from < 5 µg of starting material (Shema et al. 2018). In another approach, Terminal Amine Isotopic Labeling of Substrates (TAILS) (Kleifeld et al. 2010), the sample is digested with trypsin and the new N termini are reacted with a branched polyglycerol aldehyde polymer in the presence of the reducing agent sodium cyanoborohydride. The polymer-bound internal peptides are then removed from the sample by filtration prior to MS analysis of the N-terminal peptides. Building on the TAILS concept, a recently developed method, High-efficiency Undecanal-based N Termini EnRichment (HUNTER), incorporates the single-pot solid-phase-enhanced sample preparation (SP3) method (Hughes et al. 2019) in place of protein precipitation for proteome purification, and uses undecanal to modify internal tryptic N termini, simplifying their depletion (Weng et al. 2019). These two innovations enable enrichment and identification of thousands of N termini from < 2 µg of starting material. Although COFRADIC, ChaFRADIC, TAILS, and HUNTER have been widely used to profile protease substrates, they have the limitations that they may have low sensitivity for low-abundance proteolytic N termini and that they cannot be targeted to specific locations within the cell.

Positive enrichment strategies for N terminomics have the advantage that they have high sensitivity for low-abundance proteolytic neo-N termini that may nonetheless play important roles in biology. A key challenge in positive enrichment N terminomics is the need for selective modification of the protein N terminus, but not lysine ε-amines, to allow for enrichment of N-terminal peptides. One approach for selective N-terminal modification relies on enzyme–substrate molecular recognition in the context of subtiligase, a designed peptide ligase derived from the serine protease subtilisin (Abrahmsen et al. 1991; Weeks and Wells 2019). In subtiligase N terminomics, the enzyme ligates a biotinylated peptide ester substrate onto protein or peptide N-terminal ⍺-amines (Mahrus et al. 2008; Weeks and Wells 2019). Another positive enrichment N terminomics approach, Chemical enrichment of Protease Substrates (CHOPS), uses an N terminus-selective 2-pyridinecarboxaldehyde-biotin probe to selectively biotinylate protein N termini (MacDonald et al. 2015; Griswold et al. 2019). In both subtiligase N terminomics and CHOPS, biotinylated N termini can be enriched on avidin beads and selectively eluted for global sequencing of proteolytic cleavage sites by LC–MS/MS.

Targeting Protease Substrates at the Cell Surface

Despite the many successes of N terminomics methods in identifying protease substrates, these techniques suffer from limitations that make them poorly suited for the study of proteolytic cleavage events at the cell surface. Cell surface proteins tend to be present at lower abundance than cytoplasmic and cytoskeletal proteins, allowing them to escape detection in MS proteomics studies unless they are enriched before MS analysis (Wollscheid et al. 2009; Bausch-Fluck et al. 2015). Because the N terminomics techniques described above rely on isolation of protein N termini after cell lysis, information about the subcellular location from which the N termini are derived is lost, precluding enrichment of plasma membrane proteins. Although it is possible to combine N terminomics with techniques for plasma membrane isolation, these methods often suffer from low specificity and would introduce extra steps into the N terminomics workflow that are likely to lead to sample losses. An ideal method for global mapping of proteolysis at the cell surface would enable separation of cell surface proteins from intracellular proteins, would function under physiological conditions, and would provide single amino acid resolution of the protease cleavage site.

Recent strategies for targeted identification of protease substrates derived from the cell surface have focused on the secretome, the collection of secreted soluble proteins and proteolytically cleaved ectodomains that a cell releases into its surroundings. MS proteomics studies of the secretome have often been hampered by the need to grow cells in the presence of serum, which contains high concentrations of albumin and other serum proteins that are much more abundant than secreted proteins derived from the cell. Early efforts to avoid these limitations focused on isolating the secretome from cells grown in serum-free media (Tam et al. 2004; Hemming et al. 2009; Jefferson et al. 2011). However, the absence of serum often leads to cell stress and decreased sheddase activity, and is incompatible with the culture of many cell types, including primary cells. A recently developed method, secretome protein enrichment with click sugars (SPECS) (Kuhn et al. 2012) was designed to enable isolation of the secretome in the presence of highly abundant serum proteins. SPECS takes advantage of the fact that most transmembrane and secreted proteins are glycosylated. In SPECS, cellular glycoproteins are metabolically labeled with azido sugars, enabling selective biotinylation of cell-derived, but not serum-derived, glycoproteins with copper-free click chemistry. These proteins can then be enriched from conditioned media and identified by MS proteomics. SPECS has provided powerful insights into protease biology, enabling identification of ADAM10 and BACE1 substrates in murine neurons and revealing that these proteases are major sheddases in the nervous system (Kuhn et al. 2012, 2016). An optimized version of SPECS, high-performance SPECS (hiSPECS), has been used to isolate the secretome from as few as 106 cells, leading to the identification and quantification of hundreds of secreted proteins and cell type-resolved mapping of the mouse brain secretome (Tüshaus et al. 2020).

Mapping Proteolysis at the Cell Surface with Subtiligase-TM

Although SPECS and hiSPECS represent a major advance in technology for identifying cell surface protease substrates, they are limited by their inability to map protease cleavage sites with single amino acid resolution. We recently developed subtiligase-TM, a subtiligase variant that is genetically targeted to the extracellular side of the plasma membrane by fusion to the transmembrane domain of the PDGF receptor β chain (Weeks et al. 2021). Subtiligase-TM efficiently and specifically biotinylates N termini on the extracellular surface, enabling their enrichment, sequencing, and quantification by LC–MS/MS (Fig. 2). Using subtiligase-TM, we sequenced hundreds of cell surface N termini from HEK293T cells and demonstrated that the majority of isolated N-terminal peptides are derived from annotated extracellular domains of transmembrane proteins. We also deployed subtiligase-TM for quantitative proteomics, enabling us to quantify proteolytic neo-N termini that are produced in response to pervanadate, a stimulus that is known to trigger ectodomain shedding. Notably, ~ 75% N termini that we identified using subtiligase-TM had not been previously observed in N terminomics datasets, demonstrating the potential of this tool to expand the universe of known protease cleavage sites on the cell surface.

Fig. 2
figure 2

Subtiligase-TM for capture, enrichment, and sequencing of proteolytic neo-N termini at the cell surface. (a) Subtiligase-TM biotinylates unblocked protein N termini at the cell surface. (b) Subtiligase-TM localizes at the cell surface. (c) After biotinylation with subtiligase-TM, cell surface N termini can be enriched and identified by LC–MS/MS

Subtiligase-TM combines the strengths of many earlier methods for identification of protease substrates by providing increased coverage of the cell surface proteome in combination with information about the exact site of proteolytic cleavage and the exact subcellular location from which the proteolytic proteoform was derived. Because subtiligase-TM biotinylates the proteolytic fragment that remains associated with the cell rather than the fragment released into the media, it enables sequencing of proteolytic cleavage events that are involved in processes other than ectodomain shedding, such as signal peptide cleavage, propeptide removal, enzyme maturation, and receptor activation. Subtiligase-TM has the additional advantage of being genetically encoded, facilitating its expression in a cell type-specific manner and making it amenable to use in any system that can be genetically manipulated, including cell lines, primary cells, and whole organisms. Cell surface proteolytic cleavage events are implicated in viral infection (Saeed et al. 2020), oncogenic transformation (Jackson et al. 2017), immunity (Khokha et al. 2013), neurodegenerative disease (Munro et al. 2016), and many other biological processes, and subtiligase-TM provides an entry point for developing our understanding of the role of proteolysis in each of these processes in molecular detail. These features make subtiligase-TM a potentially transformative tool for identifying protease cleavage sites at the cell surface with single amino acid resolution.

Although subtiligase-TM is a powerful tool for spatially resolved mapping of proteolysis, some challenges remain that are likely to be addressed in future work. While the genetic encodability of subtiligase-TM is an advantage in some cases, the requirement for expression of the enzyme inside the cell of interest can be a limitation for some sample types. Future developments that enable targeting of subtiligase to the plasma membrane by other means will expand the applicability of subtiligase for mapping cell surface proteolysis to samples that cannot be genetically manipulated, such as primary human samples and non-model organisms. Subtiligase-TM is also limited by its N-terminal specificity, with certain N-terminal sequences modified more efficiently than others. Although numerous variants of subtiligase with altered sequence specificity have been developed (Weeks and Wells 2018) and can be used in the context of subtiligase-TM (Weeks et al. 2021), this strategy requires additional experiments to attain more comprehensive N-terminal sequence coverage. Protein engineering efforts to develop subtiligase variants with even broader specificity would therefore significantly advance application of subtiligase-TM to study diverse proteolytic signaling pathways. An additional limitation of subtiligase-TM is that its use is, at present, restricted to the outer surface of the cell because its substrate is cell impermeable. Development of cell permeable subtiligase substrates would enable subtiligase to be targeted to intracellular locations for more comprehensive spatially resolved mapping of proteolytic cleavage events.

Conclusion and Outlook

The development of new technologies for proteomics-based protease substrate identification has revealed that proteolytic regulation is far more common in biology than was previously appreciated. Enabled by methodological advances, recent work has demonstrated that cell surface proteolytic events play roles in regulating receptor activity, cell adhesion, and cellular signaling and that this proteolytic regulation is often dysregulated in human diseases such as cancer. New tools including subtiligase-TM and hiSPECS are likely to drive progress in this area in the future, uncovering new protease substrates, connecting them to specific proteases, and facilitating the development of targeted functional hypotheses about specific proteolytic cleavage events at the plasma membrane. Beyond identifying protease-substrate pairs at the plasma membrane, an important challenge going forward is to understand the spatial organization of proteolysis, both across cell types and at the subcellular level. Because subtiligase-TM is genetically encoded, it can in principle be expressed under cell type-specific promoters to advance our understanding of the spatial regulation of proteolysis. Beyond the plasma membrane, subtiligase could also be targeted to other subcellular locations by genetic fusion with appropriate targeting domains, opening up opportunities for spatially resolved mapping of intracellular proteolysis, both at organellar membranes and within specific cellular compartments. These anticipated developments will provide a detailed picture of proteolytic signaling that will advance our fundamental understanding of protease biology and that holds promise to fuel discovery of new biomarkers and therapeutic targets relevant to human disease.