Introduction

Next-generation protein sequencing technologies are now emerging that have the potential to be as disruptive to the field of proteomics as next-generation DNA and RNA sequencing were to genomics and transcriptomics, respectively (Restrepo-Perez et al. 2018). Although RNA transcript levels within a cell or tissue can provide some information about protein abundance, the information derived from this measurement is limited since there is no general correlation between transcript abundance and protein level in a cell (Macaulay et al. 2017). Moreover, protein levels within a cell vary widely with up to 10 orders of magnitude differences between low and high abundance species (Ho et al. 2018; Ponomarenko et al. 2016; Schmidt et al. 2016). To identify low-abundance proteins, current proteomics methods largely rely on high-resolution mass spectrometry of purified biomarkers and/or detection using antibodies against known protein targets (Liu et al. 2019). In principle, these methods can accurately detect protein species down to attomolar concentrations in complex proteomic mixtures (Tsedilin et al. 2015). While highly effective, proteomic methods using detection with antibodies or identification by mass spectrometry rely on a priori knowledge of the proteins present in the sample. Such approaches therefore cannot easily identify novel or unanticipated protein species. Moreover, all proteomic techniques must contend with challenges of identifying protein products resulting from variation in gene translation as well as the numerous post-translational modifications (PTM) of proteins. New proteomic methods that enable improve non-targeted, identification of low abundance protein species, and more robust detection of PTMs therefore continue to be sought (Parker et al. 2010; Yu et al. 2009).

Currently, there are several different technologies being developed for next-generation protein sequencing that could address gaps in proteomic measurement technologies (Callahan et al. 2020; Robertson and Reiner 2018; Rodriques et al. 2019; Tang 2018). These technologies include nanopores, electron-tunneling spectroscopy, fluorescent imaging of protease-digested proteins, the multiplexing of high-sensitivity mass-spectroscopy, and fluorescent imaging of immobilized peptides. For the latter, readout based on selective chemical tagging of amino acids with fluorophores has been proposed (Swaminathan et al. 2018). Alternatively, a fluorescently tagged protein, nucleic acid, or other reagent that binds specifically to the N-terminal amino acid on a peptide could allow sequential readout of the sequence. This biomolecule, referred to as a NAAB (N-terminal amino acid binder) affinity reagent (or binding reagent), is used to specifically bind the N-terminal amino acid of peptides generated after enzymatic digestion of a protein or proteomic sample (derived from an organism, cell, or tissue) (Fig. 1a). The peptides are immobilized on a surface via the C-terminal amino acid, leaving the N-terminus exposed to be recognized by the NAAB affinity reagent (Fig. 1b). The exposed N-terminal amino acid of each immobilized peptide is then identified via the binding of one of the NAAB affinity reagents. Following binding and fluorescent identification, the terminal amino acid is removed using Edman chemistry, exposing the next amino acid on the polypeptide chain for subsequent identification.

Fig. 1
figure 1

Using NAABs in protein sequencing platform. a Schematic representation of the work flow of using NAABs for protein sequencing. The protein sample is extracted from the cell, tissue, etc., followed by enzymatic digestion with a specific protease. The peptides generated by the digestion are attached via the C-terminus onto a microscope slide. The fluorescently labeled NAABs are added to bind the N-terminal amino acid before an image is taken to identify the terminal residue. The NAABs are washed away form the peptides before the removal of the N-terminal residue via an Edman degradation. The process is repeated until the entire sequence of peptides is completed. b An example of the specificity required from a NAAB. In the peptide (Phe-Ile-Leu-Arg-Phe-Met), the NAAB needs to recognize the N-terminal Phe (red) but not the Phe in the middle of the peptide (green)

Biomolecules currently in development as NAAB affinity reagents are based on both protein and nucleic acid molecules. Here, we summarize the different candidate molecules under consideration for development as NAAB affinity reagents and the advantages and limitations of each. The amenability of each candidate to be engineered for new functions will also be discussed.

Required properties of a NAAB affinity reagent

In order to function as an NAAB affinity reagent in a protein sequencing apparatus, a candidate biomolecule needs to meet several requirements (Table 1). Of these requirements, one of the most important characteristics of a reagent is its ability to bind with high specificity. One would like the reagents to bind specific amino acids at the N-terminus of a peptide while discriminating against all other amino acids, including the same amino acid found at an internal position within a given peptide (Fig. 1b). In some cases, however, binding a class of amino acids (e.g., aromatic residues) may also be useful. Ideally, the reagents should be developed with only the terminal amino acid contributing to binding. However, for some candidate reagent proteins, the residue at the penultimate positions of the target peptide (P2 and P3, the 2nd and 3rd amino acids from the N-terminus in the peptide) may influence the binding affinity. The molecular weight (size) and oligomeric state (complexity) of the NAAB affinity reagent should also be taken into consideration. The larger the protein, the more complicated it can be to engineer specific binding, and the greater the chance of non-specific binding to non-cognate peptides, the surface of the sequencing apparatus, and/or other NAAB affinity reagents. A large protein may also result in steric exclusion, preventing the reagent from binding a peptide that is attached to a surface. In addition, monomeric proteins are much preferred to limit the complexity of the system and make NAAB affinity reagent engineering simpler as many directed evolution approaches involve display of monomeric protein, and creating a DNA library of mutants can be simpler with smaller genes. In addition, if the reagent is an enzyme (e.g., peptidase), it may require mutations to knock out undesired enzymatic activity while maintaining binding affinity. The sequencing apparatus will also likely work in aqueous solution and thus, the reagents need to be soluble and compatible with other requirements of the sequencing workflow (e.g., pH and temperature ranges). Furthermore, it will be advantageous if the reagents that recognize different N-terminal residues can function under similar conditions such as buffer composition, temperature, and pH. This will allow their combination into “one pot” during the sequencing process. Some natural candidates for NAAB affinity reagents require cofactor for binding. This requirement for a cofactor may not be ideal in a “one pot” reaction as it may not be compatible with other binding reagents.

Table 1 Properties of candidates to be developed as affinity reagent

Protein-based NAAB affinity reagents

There are a number of protein families that recognize specific amino acids, either as free amino acids in solution or as a part of a polypeptide. These proteins provide potential leads for development of NAAB affinity reagents and could find use in their native form. In most cases, however, protein engineering of the native sequence will be required to improve their properties to meet the affinity and selectivity requirements for use in the next-generation sequencing application.

Aminoacyl-tRNA synthetases

Aminoacyl-tRNA synthetases (aaRSs, also knowns as ARSs) are a family of enzymes that catalyze the attachment of the correct amino acid onto its cognate tRNA (Ibba and Soll 2000; Rajendran et al. 2018). In the first step of the reaction, the synthetase binds its cognate amino acid and, in an ATP-dependent manner, forms an aminoacyl-adenylate (enzyme-amino acid-AMP complex), releasing a pyrophosphate in the process. In the next step, the adenylate-aaRS complex binds to the cognate tRNA, and the amino acid is transferred from the aa-AMP to the tRNA forming the “charged tRNA” which is used for protein synthesis.

aaRSs form a large and diverse family of proteins, each of which can correctly bind a specific amino acid corresponding to the specific tRNA (Rajendran et al. 2018). There are therefore at least 20 aaRSs in each organism that bind the 20 different canonical amino acids. Due to their crucial role in protein synthesis, the aaRS enzymes must have high fidelity in recognizing its cognate amino acid. Each mistake the enzyme makes will result in a mis-incorporated amino acid during protein synthesis. Thus, aaRSs might be considered good candidates to be developed as amino acid specific binding reagents. They exhibit the high specificity required for a NAAB affinity reagents. In addition, they are present in organisms from all three domains of life. The presence of the proteins in organisms from different environments may be beneficial for reagent development. For example, if a thermostable reagent is needed, the aaRS can be isolated from a thermophilic organism. Similarly, if a reagent that can tolerate low or high pH is required, the aaRS can be isolated from an acidophile or alkaliphile, respectively.

However, there are several aspects of aaRSs that make them less attractive as NAAB affinity reagents (Table 1). They evolved to recognize free amino acids with high specificity, but would need to be engineered to recognize an amino acid as the N-terminal residue of a peptide, and likely in the absence of the cognate tRNA. Engineering the aaRS to bind an amino acid in the context of residing at the N-terminus of peptide may affect the affinity and/or specificity of the enzyme. Another drawback is that some aaRS are large multidomain proteins, and many are active as homo- or hetero-multimers. On the other hand, several aaRSs have been engineered to bind non-canonical amino acids (ncAAs) (for example, see Crnkovic et al. 2019). This suggests that similar approaches may be used to engineer the proteins to bind terminal amino acids on peptides.

Periplasmic binding proteins

PBPs (periplasmic binding proteins) comprise a large number of diverse proteins found in bacteria and archaea, and are involved in the transport of small molecules and ions into the cytoplasm. PBPs play an essential role in providing essential nutrients such as vitamins, lipids, metal ions, peptides, and amino acids to the cell (Fehr et al. 2004; Sandhu et al. 2017). The proteins are monomeric and consist of two globular domains connected by a hinge region, with a ligand-binding site located in a cleft formed between the two domains. These proteins adopt two different conformations related to ligand binding: an open, ligand free structure, and a closed form adopted upon ligand binding which involves a large movement of the hinge region between the two domains. The ligand-mediated conformational change is an attractive feature of PBPs for engineering and has been altered to develop proteins with new functions as biosensors, allosteric control elements, and more (for example, see Dwyer and Hellinga 2004; Ko et al. 2017; Ribeiro et al. 2019).

In several reported cases, it has been shown that PBPs form complexes with nanomolar affinities and that the proteins can be engineered for new functions (e.g., biosensor (Guntas et al. 2005; Ko et al. 2017; Tullman et al. 2016)). Moreover, several PBPs found in nature have evolved to bind free amino acids including Arg, His, Ile, Leu, Pro, and Val (for example, see Ausili et al. 2013; Chu et al. 2013; Fulyani et al. 2013; Magnusson et al. 2004). As such, PBPs represent another good set of candidates for development as NAAB affinity reagents. However, in nature, the PBPs evolved to bind free amino acids (Table 1). Similar to aaRSs, it is not clear if PBPs can be engineered to bind an amino acid at the N-terminal end of a peptide without losing specificity and/or affinity. The carboxyl end of the amino acid often contributes to ligand binding based on analysis of the crystal structures of apo PBPs as well as PBPs in complex with free amino acid ligands (for example, see PDB IDs 4G4P, 4KPT, 4KR5, 4KQP (Fulyani et al. 2013), 4PRS, 4PRH (Ruggiero et al. 2014), and 1WDN (Sun et al. 1998)). On the other hand, the wealth of structures available for PBPs and PBP-ligand complexes provides valuable information for guiding structure-based design. In addition, PBP proteins can be found in bacteria growing in different environments and thus, if needed, PBPs that can tolerate extreme conditions can also be isolated (for example, high temperature (Ausili et al. 2013)). Although the concerns with developing the PBPs as binding reagents are similar to those with aaRSs, an advantage of the PBPs over aaRS proteins is that this family of proteins is often active as monomers and is relatively small in size.

Peptide transporters

The proteins described above, aaRSs and PBPs, evolved to bind free amino acids in solution and not in the context of a polypeptide chain. Thus, they may not be amenable to be developed as NAAB affinity reagents where the amino acid to be recognized is located at a peptide amino terminus. Looking further into ideal protein families for NAAB affinity reagent development, one might first ask, which proteins naturally bind peptides? In this respect, there are three main systems involved in peptide transportation across the cell membrane. Although these proteins often bind peptides without regard to amino acid specificity, they may be good starting points for the engineering of binding reagents.

Oligopeptide permease systems

Oligopeptide transport systems from bacteria are part of the large family of ABC (ATP-binding cassette) transporters (Maqbool et al. 2015; Monnet 2003; Tam and Saier Jr. 1993; Tanaka et al. 2018). ABC transporters are multiple subunit complexes that include an extracellular protein(s) that capture the substrate, transmembrane protein(s), and membrane-associated proteins. One or more subunit contains ATPase activity, as the translocation of the cargo across the membrane is fueled by ATP hydrolysis.

In bacteria, one of the oligopeptide transport systems, Opp (oligopeptide permease), is composed of five subunits: OppABCDF (Ames 1986; Monnet 2003). The OppA protein captures the peptides from the external medium, two transmembrane proteins, OppB and OppC, form the pore needed for peptide translocation through the membrane, and two membrane-associated cytoplasmic ATPase proteins, and OppD and OppF, are members of the AAA+ (ATPases associated with diverse cellular activities) family of ATPases (Erzberger and Berger 2006; Ogura and Wilkinson 2001). The main difference between the Opp system in gram-negative and gram-positive bacteria is that the OppA in gram-negative bacteria is periplasmic, and in gram-positive, the protein is associated with the membrane via a NH2-terminal lipomodification (Monnet 2003).

OppA proteins from several organisms have been studied, and they can bind peptides of various lengths from dipeptides to peptides several dozen amino acids in length (Detmers et al. 2000). OppA’s ability to bind a large variety of peptides and its presence in bacteria and archaea (Gogliettino et al. 2010) suggest that OppA proteins could be good candidates for development as NAAB affinity reagents. However, several aspects of this protein make it less attractive. Although the protein binds different sizes of peptides, it shows very little amino acid specificity, preferring positively charged residues in positions P3 and P4 (Monnet 2003). Proteins that show some preference in amino acid binding may be a better starting point for protein engineering. In addition, the OppA proteins are fairly large (around 60 kDa) (Monnet 2003), making them less attractive as binding reagents and large for available platforms for protein engineering through directed evolution.

Dipeptide permease systems

In addition to the Opp systems, bacteria contain another ABC-type transporter with similar organization (Weinberg and Maier 2007). The Dpp (dipeptide permease) also contains five proteins (DppABCDF) and participates in di- and tri-peptide transport into the cell. Similar to OppA, the DppA protein is a periplasmic peptide binding protein responsible for capturing of di- and tri-nucleotides in the periplasmic space (Olson et al. 1991). Although di- or tri-peptides are too small to function in a sequencer apparatus, it is possible that the DppA protein can be engineered to bind longer peptides. However, as specificity is very important for a NAAB affinity reagent and DppA does not exhibit sequence specificity, it would also have to be engineered to meet this requirement. The need for intensive engineering makes DppA less likely to be an optimal candidate for development as a NAAB affinity reagent. However, only a small number of bacterial DppA proteins have been characterized. A DppA protein with amino acid specificity may yet be identified, which could allow for a better starting point for protein engineering.

Proton coupled peptide transporters

The POT (proton-dependent oligopeptide transporter) family of proteins (also called PTR (peptide transport) family) is found in all three domains of life and belongs to the MFS (major facilitator superfamily) family of transporters (Newstead 2015; Reddy et al. 2012). Although these are large proteins (50–110 kDa; 500–1000 amino acids), POT appears to be mainly involved in the intake of small peptides including di- and tri-peptides. The proteins are monomeric and contain 12–18 aa transmembrane domains that derive their energy of transport from the import of protons.

POT proteins evolved to bind peptides and thus may appear to be good candidates. However, they have several characteristics that may challenge their development as NAAB affinity reagents (Table 1). They bind only short, di- and tri-peptides, yet a robust peptide sequencer requires binding of longer peptides. In addition, the POTs bind peptides without sequence specificity; the protein sequencing application requires specific binding to the N-terminal amino acid. Furthermore, the POT proteins are large. Another major problem in the development of POT as binding reagents is that they are membrane-bound proteins. As discussed above, for the development of protein sequencers, soluble, well-folded proteins are required. Most membrane proteins are not well folded and/or functional when purified without the membrane, even when detergents are used. Therefore, although POT proteins have evolved to bind peptides, these proteins may be considered less-desirable candidates to be developed as NAAB affinity reagents.

The N-end rule

The N-end rule is a highly regulated protein degradation pathway and is conserved in all life forms (Sriram et al. 2011; Tasaki et al. 2012; Varshavsky 1996; Varshavsky 2019). This pathway determines the half-life of a protein based on the identity of its N-terminal residue. Therefore, the N-terminal residues can be classified into two broad groups: stabilizing or destabilizing. Proteins bearing an N-terminal destabilizing residue (N-degron) are recognized by a specific N-end rule recognition component (termed an N-recognin) and then delivered to a protease for degradation. For example, in Escherichia coli, while positively charged and some aliphatic and aromatic residues on the N-terminus of a protein may have half-lives of few minutes, other amino acids at the N-terminus may result in proteins that are stable for several hours.

In eukarya and archaea, proteins destined for intracellular proteolysis are tagged for degradation by post-translational modification with a small protein: ubiquitin (Ub) in eukarya (Komander and Rape 2012; Swatek and Komander 2016), and small archaeal modifier proteins (SAMP) in archaea (Maupin-Furlow 2013). In both cases, the modifying proteins are covalently attached to Lys on the substrate, and the modified substrate is degraded by the proteasome. Most bacteria do not use a similar system. To date, the only exception is Mycobacterium tuberculosis, in which a prokaryotic ubiquitin-like protein (Pup) was identified and shown to be functionally analogous to Ub and SAMPs by providing the signal for degradation by the proteasome (Bode and Darwin 2014).

Many proteins participating in the N-end rule of bacteria and eukarya are good candidates to be developed as NAAB affinity reagents. Several of those are described below. A comprehensive list of candidate enzymes can be found in Varshavsky (2019).

ClpS

Most bacteria do not utilize protein modification as a signal for protein degradation via the N-end rule. The bacterial pathway contains two components, the ClpAP protease (Olivares et al. 2018) and the adaptor protein ClpS (Dougan et al. 2002). The ClpS protein binds an N-degron and delivers the protein to ClpAP for degradation. ClpS was shown to specifically recognize N-terminal Phe, Trp, Tyr, and Leu amino acids and this specificity makes ClpS a particularly good candidate to be developed as a binding reagent (Tullman et al. 2019) (Table 1). The protein displays two of the properties of a good binding reagent, the ability to bind peptides, and some specificity for specific N-terminal amino acids. However, one of the drawbacks of using ClpS is the effect of the amino acid in the P2 and P3 positions on binding affinity and selectivity. It was shown that different residues at the P2 and P3 positions affect ClpS binding to the N-terminal amino acid (Stein et al. 2016). Engineering of ClpS can reduce the effect of these penultimate amino acids and enhance its thermostability (Tullman et al. 2020).

UBR

In eukarya, the protein degradation by the ubiquitin system controls the levels of many intracellular proteins. A substrate of the Ub system is conjugated to Ub through the action of three enzymes: E1, E2, and E3. In those systems, the N-degrons are recognize by the UBR box domain of the E3 ubiquitin ligase, called N-recognins. In yeast, the N-recognin protein is expressed by the ubr1 gene. The UBR1 contains three types of substrate-binding sites. One site binds basic N-terminal residues (Arg, Lys, and His) on the N-terminus of proteins or peptides (Choi et al. 2010; Tasaki et al. 2009). The second binding site recognizes bulky hydrophobic residues (Leu, Ile, Phe, Tyr, and Trp) and the third binding site recognizes internal degron and not N-terminal ones. It was found that UBR box can also bind methylated peptides (Munoz-Escobar et al. 2017). The ability of the UBRs to bind proteins and peptides with some specificity for specific amino acid makes them potential candidates for development as NAAB affinity reagents. However, one of the drawbacks is their large size, being about 200 kDa. This size will limit the ability to engineer the protein for enhance specificity. On the other hand, there are two different sites that are involved in binding to hydrophobic and basic peptides. Thus, it may be possible to separate the two regions and generate smaller protein domains that are more amenable to protein engineering techniques. It is also possible that such smaller domains, similar in size to ClpS, may be sufficient for peptide binding (Lupas and Koretke 2003).

Transferases

Several transferases were also shown to participate in the N-end rule process. The L/F transferase (leucyl/phenylalanyl-tRNA-protein transferase) was shown to catalyze the transfer of Leu and Phe destabilizing residues from aminoacyl-tRNAs to the amino terminus of acceptor proteins containing an N-terminal Arg, Lys, and Met, thus tagging them for degradation (Ninnis et al. 2009; Watanabe et al. 2007). Another example of a transferase that can be developed as a NAAB affinity reagent is Ate1 proteins. In eukarya, the Ate1 gene encodes for an arginyl-transferases which transfer Arg to N-terminal Asp, Glu, and cysteine residues (Hu et al. 2005). The N-terminal Arg is recognized by the ubiquitin ligases leading to protein degradation. BPT is another example. This protein is an aminoacyl-transferase identified in Vibrio vulnificus. Although its sequence is similar to the eukaryotic arginyl-transferases, it conjugates Leu to Asp and Glu (Graciet et al. 2006). Another aminoacyl-transferase, termed ATEL1, was identified in Plasmodium falciparum. This protein shows sequence similarities to the bacterial L/F transferase but with similar specificity to the Ate1 protein (Graciet et al. 2006).

These peptidyl transferases could be good candidates for NAAB affinity reagent if the enzymatic activity can be eliminated via protein engineering. Additionally, it may be possible to eliminate the need for a tRNA cofactor but retain binding to the target peptide (Abramochkin and Shrader 1995).

Aminopeptidases

Aminopeptidases are a group of exopeptidases that catalyze the cleavage of the N-terminal amino acid from proteins or peptides (Ferroa et al. 2014; Gonzales and Robert-Baudouy 1996; Sanderink et al. 1988). They constitute a large group of proteins that are present in all three domains of life. Aminopeptidases can be found in the cytoplasm, embedded in the membranes, anchored to the cell membrane, or secreted out of the cell. Aminopeptidases play a role in several physiological processes. Some participate in the catabolism of exogenously supplied peptides, while others are required for protein turnover in vivo. In addition, some aminopeptidases are involved in specific functions. For example, the bacterial and archaeal methionine-aminopeptidases are responsible for the cleavage of N-terminal methionine from newly synthesized proteins.

Aminopeptidases vary in size, specificity, activity, and biophysical property (Ferroa et al. 2014; Gonzales and Robert-Baudouy 1996; Sanderink et al. 1988). While some are active as monomers, others are multimeric. There are also various cofactors needed for activities (for example, Zn-metalloenzyme, Ca-metalloenzyme, cysteine-enzyme, serine-enzyme). Another difference between the different aminopeptidases is specificity. While many aminopeptidases have broad specificity and degrade any N-terminal amino acid, some are amino acid–specific and can only catalyze the removal of a specific amino acid such as Ala, Pro, Gly, and Met (Gonzales and Robert-Baudouy 1996).

The fact that aminopeptidases can recognize a specific amino acid makes them good candidates for development as a binding reagent, as these enzymes have also already evolved to bind peptides. And, although the enzyme will cleave the terminal amino acid, it has been shown that, at least in some cases, mutating the active site residue can prevent cleavage of the bound peptide while not affecting enzyme specificity (Thompson et al. 2003).

While the ability of aminopeptidases to bind peptides with a specific N-terminal amino acid could make them good candidates for development as NAAB affinity reagents, other aspects of the protein need to be considered including oligomeric structure, size, and the cofactor required for binding. In addition, in some cases, the residues in the P2, P3, and P4 have been shown to influence protein binding to the terminal amino acid (Xiao et al. 2010). As already mentioned, any effect of the downstream amino acid on binding is not desirable for a NAAB affinity reagent. It is possible that protein engineering may enable the alteration of the enzymes and reduce the effect of the P2–P4 positions on peptide binding.

Antibodies and nanobodies

Antibodies and nanobodies (also known as single-domain antibodies) are known to bind their ligand with high affinity and specificity. Thus, one would assume that they may be good candidates to be developed as NAAB affinity reagents. However, antibodies are large, averaging 150 kDa, tetrameric complex with multiple domains which could significantly limit use as NAAB affinity reagents for protein sequencing. Nanobodies, on the other hand, are small, about 15 kDa, and composed of only one domain. Nevertheless, the ligands recognized by antibodies and nanobodies are fairly large and usually composed of several amino acids and often are not consecutive on the polypeptide chain. There are, however, exceptions, for example, the anti-phosphotyrosine antibodies that recognize phosphorylated tyrosine within a polypeptide chain but not specifically at the N-terminus of a protein/peptide (for example, Cobaugh et al. 2008; Mandell 2003). Although the epitopes are usually composed of several amino acids, there are examples of shorter single amino acid epitopes (for example, an antibody that can recognize di-Gly peptide; Wagner et al. 2011; Yoshida et al. 2015). Nevertheless, due to their limitations (Table 1), antibodies and nanobodies are not likely suitable for development into NAAB affinity reagents.

Nucleic acid–based NAAB affinity reagents

RNA aptamer

Riboswitches are segments of bacterial mRNA usually found in the 5′-untranslated regions (5′-UTRs) that play a regulatory role in the translation of proteins involved in metabolism and transport of metabolites (Breaker 2018; Sherwood and Henkin 2016). The small metabolite binds specifically to the riboswitch segment of the mRNA and directly regulates protein expression in response to the metabolite cellular concentrations. In general, riboswitches consist of two parts, the aptamer domain that specifically binds the small metabolite and a second domain that is involved in regulating gene expression (Breaker 2018; Sherwood and Henkin 2016). Several mechanisms for riboswitch actions have been studied including the formation of termination signals when the level of the metabolite is high, thus resulting in an aborted transcription. Another mechanism is the formation of a RNA structure that masks the ribosome binding site on the mRNA, thus preventing translation when the metabolite levels are high. It was also found that in some cases the riboswitch also functions as a ribozyme that cleaves itself (i.e., the mRNA) when its cognate metabolite levels are high.

Several riboswitches have specifically evolved to bind amino acids. Those include the Gln riboswitch (Weinberg et al. 2010), Gly riboswitch (Mandal et al. 2004), and Lys riboswitch (Mandal et al. 2003). Although the naturally occurring riboswitches evolved to bind only an amino acid and not a terminal amino acid on a peptide, they are potential candidates for further engineering to make them useful reagents. In addition, only the aptamer part of the riboswitch is needed to make a binding reagent and thus may be more amenable to modifications without affecting affinity to the specific amino acid. To date, naturally occurring riboswitches have been identified against only a few amino acids. However, studies have shown that aptamers that recognize other amino acids can be developed including those which recognize Val (Majerfeld and Yarus 1994), Ile (Legiewicz and Yarus 2005), Tyr (Mannironi et al. 2000), and other amino acids (McKeague and Derosa 2012). There are many selection protocols designed to isolate aptamers that bind specific targets from a large random sequence pool (Blind and Blank 2015). Based on the amino acid aptamers that have been studied, one would expect that with the right screening scheme aptamers that recognize other amino acids and/or peptides could be identified.

However, one of the major drawbacks of using RNA as a binding reagent is its stability. RNA is very sensitive to nucleases and other degradation processes (for example, exposure to basic pH). Thus, RNA may not be stable during the sequencing process and thus may not be a good candidate for binding reagent. The incorporation of modified nucleotides (such as 2′-O-methyl rNTPs), however, may be used to improve aptamer stability.

Other possible aptamers

DNA could also be developed as NAAB affinity reagents. Several DNA aptamers have been developed to bind small molecule with high affinity and specificity (McKeague and Derosa 2012) and thus may also be developed to bind specific amino acids. Although DNA is more stable and easier to manipulate, its folding is not as versatile as RNA and thus, it may not be possible to design a DNA molecule to bind with the required specificity and affinity to N-terminal amino acids.

One can also consider using non-natural, nucleotide-based, NAAB affinity reagents such as PNA (peptide nucleic acid) and LNA (locked nucleic acid) (Jepsen et al. 2004; Siddiquee et al. 2015).

Concluding remarks

Although, currently, there is no working prototype of a next-generation protein sequencer, the field is moving forward at a rapid pace, and peptide fingerprinting is already a reality (Swaminathan et al. 2018; van Ginkel et al. 2018). For fluoro-sequencing based on sequential identification of the N-terminal amino acid of a polypeptide, one of the main challenges is the development of NAAB affinity reagents that bind N-terminal amino acids of a peptide or protein with high specificity and affinity. Here, we have described the properties required for NAAB affinity reagents and suggested some native systems that could form the basis for candidates to be developed as reagents. However, new candidates with the required properties are still to be discovered (for example, see the glycine-specific N-end rule pathway (Timms et al. 2019)). A combination of naturally evolving biomolecules, together with engineering approaches to improve their utility as NAAB affinity reagents, could provide the necessary tools to address this critical need to advance this approach to next-generation protein sequencing.