Introduction

Fluorescent proteins have revolutionized the ability of researchers to visualize subcellular structures and to study biochemical activities with unprecedented spatial and temporal resolution. Since the first use of the green fluorescent protein (GFP) outside its original host Aequorea victoria in 1994 (Chalfie et al. 1994), considerable efforts have been devoted to improving its spectral and biochemical properties and to optimizing its behavior in live cells and organisms. Nowadays, GFP and its color variants are routinely used by cell biologists to provide a wealth of information on a wide variety of biological processes. The fact that fluorescent proteins (FPs) do not require any substrates or cofactors, and that they are genetically encoded make them an ideal tool for specific protein labeling. Despite these advantages, FPs have a few shortcomings (Giepmans et al. 2006; Marks and Nolan 2006). First, with a molecular weight of 30 kDa, FPs can interfere with the folding, structure, localization, and function of the tagged protein. Second, as the chromophore of FPs is buried in their β-barrel structure, FPs are less accessible for modifications without loss in fluorescence. Third, FPs are much less bright and photostable than the best organic fluorophores and therefore are not optimal for applications requiring a high photon-output, such as single-molecule super-resolution imaging. Fourth, FPs generally show a narrower dynamic range than organic flurophores in response to environmental changes. These drawbacks have motivated researchers to develop alternative chemical labeling strategies.

In chemical labeling, instead of using an FP, a dye-free protein domain or polypeptide sequence is genetically fused to the protein of interest. This protein domain or polypeptide sequence is subsequently modified with an organic fluorophore or other targeting molecule through covalent conjugation or high-affinity binding (Fig. 1). Thus, chemical labeling presents the combined advantages of specific labeling through the genetic fusion to the protein of interest and the superior sensitivity and versatility of synthesized and designed organic probes. In addition, the absence of an intrinsic fluorophore allows a single genetic fusion construct to be used for many distinct applications, simply by choosing different organic probes to combine with the tagged constructs, whether for in vitro or in vivo applications.

Fig. 1
figure 1

Chemical labeling of proteins of interest. a The protein of interest is fused to a fluorescent protein with intrinsic fluorescence. b The protein of interest is fused to a protein domain tag, which can bind to a fluorophore derivative through covalent or non-covalent interactions. c The protein of interest is fused to a peptide tag, which then acts as the enzyme substrate for covalent fluorophore attachment. d The protein of interest is fused to a protein or peptide tag, which then binds and activates a fluorogen through covalent or non-covalent interactions

In 2014, the Nobel Prize in Chemistry recognized super-resolution imaging, a breakthrough in optical microscopy that enables researchers to visualize and investigate biological processes at the individual molecule level inside living cells. Such achievements would be impossible without the innovation of chemical labeling in the past two decades, delivering dyes with high brightness and photostability to targeted sites in living cells and organisms. In this review, we focus on the development of various chemical labeling strategies and technologies and highlight the unique properties and advantages of chemical tags in recent applications (Table 1). With the unforeseeable potential of chemical labeling techniques, we expect further innovative methods will soon be applied to solve sophisticated biological problems.

Table 1 Properties and major applications of chemical tags discussed in this review (TMP–eDHFR Trimethoprim-Escherichia coli dihydrofolate reductase, PYP photoactive yellow protein, AP acceptor peptide, FAP fluorogen activating protein, FRET Forster resonant energy transfer, QDs quantum dots, LRET luminescence resonant energy transfer, CALI chromophore-assisted light inactivation)

Protein-domain-based chemical tags

In this labeling strategy, a protein domain is fused to the target protein to subsequently incorporate the fluorescent label. One of the most widely used chemical tags in this class is the SNAP tag developed by Johnsson’s group (Keppler et al. 2003). The SNAP tag is based on a DNA repair protein human O6-alkylguanine-DNA-alkyltransferase (hAGT), a 20-kDa protein that transfers the alkyl group on the O6 position of benzyl-guanine to a reactive cysteine residue. By employing random mutagenesis followed by phage display and a yeast three hybrid assay, an AGT mutant has been identified that shows a 17-fold and 52-fold faster reactivity than the starting mutant and the wild-type protein, respectively (Gronemeyer et al. 2006). In addition, the mutant AGT accepts a large number of derivatized substrates with various fluorescent labels, shows a low affinity towards DNA, and is stable in an oxidizing environment, such as the cell surface. To date, various SNAP tag fusion proteins have been generated both extracellularly and in the cytoplasm and have been applied in recently developed super-resolution imaging and single-particle tracking (Benke et al. 2012). Moreover, an AGT variant that recognizes benzyl-cytosine as the substrate has also been established and is named the CLIP tag (Gautier et al. 2008). The SNAP tag and CLIP tag act orthogonally and have been used to detect protein-protein interactions by selective cross-linking: the interaction between two partners, one fused to the SNAP tag and the other to the CLIP tag, has been visualized by a bifunctional molecule composed of the two tag substrates connected by a fluorophore. This molecule therefore bridges the interacting partners with high selectivity (Gautier et al. 2009). When linked together, the SNAP tag and CLIP tag can act as a Forster resonant energy transfer (FRET) pair and be used to detect changes in the concentration of a metabolite (Brun et al. 2009, 2011). The SNAP tag has also been engineered to sense the local calcium changes when a BODIPY (boron-dipyrromethene)-based calcium sensor BOCA is conjugated to the SNAP substrate BG, leading to a 180-fold increase in fluorescence signal in response to calcium binding (Kamiya and Johnsson 2010). More recently, the SNAP-tag-labeling technology has been applied to living organisms, for example, the zebrafish embryo (Campos et al. 2011). After injecting the mRNA encoding SNAP-fusion proteins at an early stage of embryo development, subcellular structures can later be visualized by fluorescence microscopy at various developmental stages. Furthermore, SNAP-tag-labeled embryos can tolerate acid-based fixations that have been shown to be incompatible with fluorescent proteins. Like any chemical labeling strategies, multiple wash steps are required in SNAP tagging to remove the unreacted fluorophore in order to minimize the background signal. To overcome such limitations, researchers have been motivated to develop fluorogenic SNAP probes for wash-free applications. In 2011, Sun et al. reported the design of a fluorogenic SNAP tag in which a quencher is linked to the guanine group (Sun et al. 2011). As a result, after reacting with AGT, the quencher bound guanine acts as the leaving group, whereas the fluorophore is now covalently linked to the protein tag AGT. Another fluorogenic SNAP tag has been generated by taking advantage of a polarity sensitive, solvatochromic dye: Nile Red (Prifti et al. 2014). Nile Red shows a strong increase in fluorescence when exposed to a low polarity environment, such as the plasma membrane (Greenspan and Fowler 1985). When conjugated to a BG derivative, Nile Red can be targeted to the SNAP-receptor protein on the plasma membrane and become fluorescent (Prifti et al. 2014). These fluorogenic labeling systems are therefore well-suited for tracking dynamic cellular events in real time.

SNAP-tag-mediated genetic targeting has been recently applied to animals (Yang et al. 2014). The SNAP tag was fused to a membrane-associated CaaX sequence and then targeted to the Rosa26 locus in mice. When such a Rosa Snapcaax line was crossed with lines expressing Cre in various neuronal tissues, specific fluorescence signal was detected on neuronal endings in the skin or cornea during live tissue confocal imaging after the tissues were subjected to BG substrate treatment. Moreover, when used with a photosensitizing substrate, namely BG-fluorescein, in chromophore-assisted light inactivation (CALI) experiments, axons with SNAPCaax expression showed complete degeneration and loss of mechanical sensory functions upon illumination. Such light-mediated cell ablation is highly specific and selective because of the genetic targeting of the SNAP tag in Cre-expressing tissues and local administration of the substrates. Therefore, these SNAP-tag-expressing animals provide an invaluable tool for in vivo functional studies at the tissue level.

Another protein-based self-labeling tag is the Halo tag. The Halo tag protein (33 kDa) is based on the bacterial enzyme haloalkane dehalogenase, which functions to remove halides from haloalkanes in a two-step process (Los et al. 2008). Mutation at his272 in the wild-type dehalogenase has been found to lead to a trapped reaction intermediate, with the alkyl group being covalently linked to an aspartate residue in the enzyme (Pries et al. 1995). A modified haloalkane dehalogenase has been developed that specifically recognizes chloroalkanes linked to fluorophores, biotin, and agarose beads (Los and Wood 2007). Because haloalkane dehalogenase is of prokaryote origin, the background in mammalian cells is low. The Halo tag technology has been applied to label β1-integrin with the substrate linked to Atto655 in STED (stimulated emission depletion) microscopy (Schroder et al. 2009) in order to investigate peroxisome dynamics (Huybrechts et al. 2009), to image tumors in various colors in live animals (Kosaka et al. 2009), and to label proteins in bacteria and mammalian cells for super-resolution imaging (Lee et al. 2010). In addition, the Halo tag has proven useful for CALI applications by using an eosin ligand and pulse-chase experiments to examine the stability of receptor proteins (He et al. 2011). As a functional extension of this tagging approach, Neklesa et al. (2011) designed a small-molecule-induced degradation system in which a hydrophobic moiety is linked to the Halo tag ligand. As a result, the hydrophobic group is anchored on the surface of the Halo tag fusion protein leading to its degradation in the proteasome through the cellular quality control mechanism. This hydrophobic tagging-induced protein degradation has been demonstrated in cultured cells and in zebrafish embryo and can be used to suppress tumor progression by selectively removing tagged oncogenes in mouse models (Neklesa et al. 2011).

In addition to covalent interactions between the ligand and the protein tag, non-covalent interactions have also been exploited for protein labeling, given the high affinity and selectivity of the interaction. Cornish and colleagues took advantage of a well-characterized interaction pair: methotrexate and the Escherichia coli dihydrofolate reductase (eDHFR; Miller et al. 2004). eDHFR is a relatively small protein tag with a molecular weight of 18 kDa. Successful labeling was achieved when a protein of interest was fused with eDHFR and stained with Texas Red-methotrexate. To overcome the potential toxicity and background signal from labeling endogenous DHFR in mammalian cells, trimethoprim (TMP) was later used to replace methotrexate (Miller et al. 2005). Specifically, TMP has a high affinity for eDHFR (1 nM), but a significantly lower affinity (>1 uM) for the mammalian counterpart. In addition to protein labeling, TMP has been conjugated to the photoswitchable organic fluorophore ATTO655 for dSTORM (direct stochastic optical reconstruction microscopy) super-resolution imaging, and ~20 nm resolution of the human histone H2B protein has been achieved in live cells (Wombacher et al. 2010). Furthermore, the TMP tag has been combined with the SNAP tag to label numerous spliceosomal proteins in yeast cell extract; this has allowed the elucidation of the kinetics and order of the spliceosome assembly pathway (Hoskins et al. 2011). Such studies would be impossible to carry out with low-photon-output blinking fluorescent proteins. The TMP tag has also been successfully used in other fluorescence related applications. When TMP is linked to a terbium complex, luminescence resonance energy transfer from eDHFR fusion proteins to GFP is detected and can be used to visualize protein-protein interactions (Rajapakse et al. 2010). TMP-fluorescein has proven to be an effective fluorophore for CALI and has been used to examine the function of non-muscle myosin II in maintaining a coherent cytoskeleton network (Cai et al. 2010). Long et al. (2011) have used the TMP-eDHFR pair to improve the biological compatibility of iron oxide nanoparticles for studying focal adhesions in response to magnetic force. Magnetic nanoparticles were coated with TMP through dopamine anchors and then introduced into cells expressing eDHFR. Cells detached from the plate after being under magnetic attraction for 4 h, suggesting that TMP-eDHFR nanoparticles are a useful tool to manipulate cells in a noncontact manner. In another application, a general strategy for inhibitor-mediated protein degradation was developed, and TMP-Boc3Arg (tert-butyl carbamate-protected arginine) induced rapid and robust degradation of eDHFR in the cell lysate, probably caused by tagging eDHFR with the hydrophobic Boc3Arg moiety (Long et al. 2012). Recently, a covalent TMP tag was developed by installing on TMP a latent acrylamide electrophile that reacts with a Cys residue just outside the binding pocket in eDHFR through proximity-induced reactivity (Gallagher et al. 2009). The positions of the Cys residue and the acrylamide electrophile were further optimized to give rise to the second-generation covalent TMP tag, which showed a much faster labeling half-life (t1/2 = 8 min) and targeted a number of intracellular proteins robustly (Chen et al. 2012). The covalent TMP tag therefore will benefit applications that require long-term or permanent protein labeling.

Jing and Cornish (2013) further extended the eDHFR-TMP system to develop a fluorogenic TMP tag with significantly improved signal to noise ratio for live-cell imaging. This fluorogenic tag (TMP-BHQ1-Atto520) contains TMP, Atto520 as the fluorophore, and BHQ1 as the quencher. In the presence of eDHFR, a proximity induced SN2 reaction takes place between a Cys (L28C) as the nucleophile on eDHFR and a tosylate linker in the tag, leading to the cleavage of the quencher and a 20-fold fluorescence enhancement. When compared with the non-fluorogenic tag TMP-Atto520, TMP-BHQ1-Atto520 showed reduced non-specific background on the cells. With the modular design of the fluorogenic TMP tag, we anticipate that a wide variety of fluorophores with diverse properties and functions can be readily incorporated.

Hori et al. (2009, 2013) developed another fluorogenic system utilizing the photoactive yellow protein (PYP) as the protein tag. PYP is a small protein (14 kDa) that was discovered in purple bacteria (Kumauchi et al. 2008) and has been found to bind to its natural substrate CoA thioester of 4-hydroxycinnamic acid and derivatives of coumarin through transthioesterification with Cys69 (Kyndt et al. 2002). A rational probe design led to FCTP, a fluorogenic probe composed of 7-hydroxycoumarin-3-carboxylic acid thioester and fluorescein with ethylene glycol as a flexible linker in between (Hori et al. 2009). In the absence of PYP, FCTP is not fluorescent because of the intramolecular quenching of the two dyes. However, upon binding to PYP, the coumarin moiety is covalently attached to PYP and separated from fluorescein, leading to markedly enhanced fluorescence of fluorescein. The next generation PYP probes are based on environment-sensitive dialkylaminocoumarin derivatives (Hori et al. 2013). Such probes are not fluorescent in a polar environment but become intensely fluorescent in a low-polarity environment, for example, when buried in the binding pocket of the PYP tag. Two of these probes with increased water solubility (TMBDMA and CMBDMA) have shown specific and quick labeling of intracellular proteins without washing (Hori et al. 2013). TMBDMA and CMBDMA have also been used to image DNA methylation in live cells (Hori et al. 2013).

Nolan and colleagues created another chemical tag by taking advantage of the interaction between the receptor-ligand pair FKBP12(F36V) and SLF’ (Marks et al. 2004a). FKBP12 is a 12-kDa immunophilin. The authors hypothesized that, when bulky groups (“bumps”) are introduced to synthetic ligands of the wild-type FKBP12, the binding affinity decreases, whereas compensating “holes” in the wild-type FKBP12 will restore the interaction with the bumped synthetic ligands. Based on molecular design and remodeling, Clackson et al. (1998) identified a synthetic ligand SLF’ exhibiting a 0.094 nM KD to the mutant FKBP12(F36V), a 1000-fold increased selectivity over the wild-type protein. A fluorescein SLF’ derivative was used to label FKBP12(F36V) fusion proteins expressed at various levels, and CALI was performed to abolish the enzyme activity of β-galactosidase (Marks et al. 2004a). Robers et al. (2009) optimized the labeling conditions of the FKBP12(F36V) tag and demonstrated its feasibility as a partner with fluorescent proteins in FRET-based assays.

As an alternative to molecular modeling, Marks et al. (2004b) set out to screen a phage display library to select for peptides that exhibit high affinity to the fluorophore Texas Red. The screening yielded two peptides (a 9-mer and a 13-mer) that bind to Texas Red with subnanomolar affinity. Because X-rhod dyes have the Texas Red chromophore structure and a BAPTA calcium-sensing moiety, the authors used the identified peptide to create fusion proteins to sense the local calcium change in cultured cells. This approach is another example that provides genetic targeting of small molecules in the cellular milieu.

Metal-chelation-based peptide tags

The selective oligo-histidine (His6 and His10) and the Ni(II)-nitrilotriacetic acid (NTA) interaction has become an indispensable tool for recombinant protein purification and surface immobilization. The his tag was recently extended to labeling proteins on the cell surface (Goldsmith et al. 2006; Guignet et al. 2004) and to single-molecule FRET in vitro (DeRocco et al. 2010) by using chromophore conjugated Ni-NTA. To improve the binding affinity of Ni-NTA to poly-his, Lata et al. (2005) synthesized multivalent chelator headgroups containing two, three, and four NTA moieties. TRIS-NTA showed ~20 nM KD, an enhancement in affinity by three orders of magnitude compared with mono-NTA. Multivalent interactions of TRIS-NTA conjugates and poly-his are highly stable, leading to  > 1 h of complex lifetimes (Lata et al. 2005). However, all of the reported NTA chromophores show a loss of fluorescence upon complexing with Ni(II), because of its paramagnetic nature. Peneva et al. (2008) have reported a perylene (dicarboximide) dye connected to NTA; this retains its fluorescence upon binding to Ni(II). The noncovalent interaction between poly-his and Ni-NTA has been rendered covalent recently through a proximity-induced nucleophilic reaction: the electrophilic tosyl group in the NTA moiety forms a covalent bond with a his residue in the tag, releasing Ni-NTA (Uchinomiya et al. 2009). To overcome the drawbacks associated with Ni(II), namely fluorescence quenching and cytotoxicity, Tsien and colleagues developed the dye HisZiFiT based on a fluorescein scaffold that utilizes Zn(II), instead of Ni(II), to bind to His6 (Hauser and Tsien 2007). Zn(II) is diamagnetic and present in biological systems and is thus better-suited for in vivo protein labeling than Ni(II). HisZiFiT shows a ~40 nM KD to His6 and has been used to address a controversy concerning stromal interaction molecule 1 (Hauser and Tsien 2007). These metal chelating chemical tags have so far been limited to labeling proteins on the cell surface, because of the impermeability of the probes and the potential toxicity of the metal.

Peptide tags mediated by enzymes

Peptide tags are generally less invasive, because of their small size, than protein tags. However, the desired specificity can be difficult to obtain with a peptide tag. For example, in the FlAsH labeling approach, extensive wash steps are required to remove the dye from binding to intracellular targets that share a similar sequence as the tetracysteine recognition motif (Hoffmann et al. 2010). To overcome the shortcomings of peptide tags, alternative labeling methodologies have been developed by utilizing enzymes. The specificity of such labeling is conferred by the peptide sequence as the substrates of the enzyme. The Ting lab has established the use of the E. coli biotin ligase (BirA) as one of the most advanced enzyme-mediated chemical tags (Chen et al. 2005). BirA recognizes and ligates biotin to the lysine side-chain in a 15-amino-acid acceptor peptide (AP) sequence; this allows subsequent modification with a functional group or fluorophore. Chen et al. (2005) have shown that ketone1, a biotin analog, can also be used as a BirA substrate. Ketone 1 is not present on natural proteins, and endogenous proteins are not labeled under the same conditions. Although biotinylation of a target protein has been demonstrated in various cellular compartments (Howarth and Ting 2008), protein labeling with BirA has been restricted to the cell surface for mainly two reasons: (1) endogenous biotin and small molecules containing ketone in the cell would compete with the functionalized substrate; (2) because BirA has a high specificity towards its limited substrate, the labeling of the AP needs to be completed in two steps, and the second step is slow and inefficient in the cellular context. Chen et al. (2007) later identified an orthogonal system to BirA and AP, namely the yeast biotin ligase (yBL) and yeast acceptor peptide (yAP), through screening a phage display library. Combining the two systems allows the simultaneous labeling of two independent receptor proteins in the same cell with quantum dots (QDs). However, yAP has a low catalytic value, and the catalytic efficiency of the yAP-yBL pair is about 800-fold lower than the that of the BirA-AP pair. To expand the substrate range of biotin ligase, Slavoff et al. (2008) have examined the biotinylation reaction with biotin ligases from nine species and eight biotin isosteres, by using p67, a domain from one of the human biotin acceptor proteins. Biotin ligases from Saccharomyces cerevisiae and Pyrococcus horikoshii have been found to accept alkyne and azide biotin analogs. These newly identified enzymes and substrates open up new avenues for protein labeling.

Another chemical tagging approach mediated by enzymes utilizes lipoic acid ligase. E. coli lipoic acid ligase (LplA) naturally couples lipoic acid to a lysine side chain in E2p, E2o, and H-protein involved in oxidative metabolism (Green et al. 1995). Ting and colleagues have re-engineered LplA to accept an alkyl azide that can then be selectively functionalized through a strain-promoted [3 + 2] cycloaddition (Fernandez-Suarez et al. 2007). The LplA acceptor peptide (LAP1), a 22-amino-acid sequence, has also been developed to replace the original large protein substrates. Although LplA is mechanistically similar to BirA, adequate bioorthogonality has been demonstrated to exist between the two labeling approaches, and two receptors can be tagged simultaneously in the same cell (Fernandez-Suarez et al. 2007). LplA has also been employed to ligate a fluorinated aryl azide photo-cross-linker to detect protein-protein interactions (Baruah et al. 2008). The peptide sequence for LplA has further been decreased to 13 amino acids through screening a yeast cell surface LAP library (Puthenveetil et al. 2009). A novel peptide sequence LAP2 has been identified by fluorescence-activated cell sorting (FACS) and shows improved kinetics over LAP1. Uttamapinant et al. (2010) extended LplA labeling from the cell surface to the interior of living cells. A blue fluorophore, 7-hydroxycoumarin, was chosen for intracellular labeling because of its small size and hydrophobicity. The authors then created several LplA mutants based on the crystal structure of the LplA-lipoic acid complex. Two of the mutants, LplA(W37V) and LplA(W37I) showed robust labeling of LAP2 fusion proteins, both from the cell lysate and when targeted to a number of cellular structures. This approach, called PRIME (probe incorporation mediated by enzymes) established, for the first time, that a fluorophore can be ligated to the substrate in one step by a ligase. Although PRIME is an attractive methodology for specific intracellular labeling based on LplA, it suffers from coumarin being the only choice for the substrate. Coumarin is not an ideal fluorophore for live cell microscopy because of its spectral properties. To incorporate a diverse set of fluorophores, Yao et al. (2012) exploited 10-azidodecanoic acid as the substrate for LplA(W37I) and the functional group handle to react with fluorescent dyes ranging from fluorescein to ATTO 647 N in a two-step scheme. This labeling approach is limited by the kinetics of the second step, the strain-promoted cycloaddition reaction, and might not be compatible with fast intracellular dynamics. Copper-catalyzed azide-alkyne cycloaddition (CuAAC) is at least ten times faster than strain-promoted cycloaddition (Uttamapinant et al. 2012). However, CuAAC has been limited to in vitro applications, because of the toxicity of copper(I) (Wolbers et al. 2006). To address the toxicity issue of copper(I) and to maintain the kinetics of CuAAC, Uttamapinant et al. (2012) explored the possibility of using a copper chelating azide. When picolyl azide 8 was coupled to PRIME labeling, fast and site-specific protein labeling and RNA labeling were achieved. The newest generation copper(I) ligand, BTTAA, further accelerated the reaction (Soriano Del Amo et al. 2010). Thus, chelation-assisted CuAAC enables biocompatible lableling with low concentrations of copper(I) opening the door to the labeling of highly sensitive cells and tissues, such as neuron cultures. PRIME labeing has also been extended to resorufin, a small bright red phenoxazine dye (Liu et al. 2014). Computational design was employed to search for resorufin ligases by using the original LplA sequence as a template. A triple LplA mutant (E20A/F147A/H149G) emerged from in silico analysis and was able to carry out covalent attachment of resorufin to LAP specifically and efficiently. In addition to conventional protein labeling and fluorescence imaging in living cells, this resorufin ligase was used to perform STED-based super-resolution imaging and electron microscopy (Liu et al. 2014).

Phosphopantetheine transferases (PPTases) from E. coli (AcpS) and Bacillus subtilis (Sfp) have also been used to selectively label cell surface proteins. The natural function of PPTase is covalently to attach a phosphopantetheine (Ppant) from coenzymeA (CoA) to a serine residue on acyl- and peptidyl-carrier proteins (ACP and PCP). Carrier proteins play an important role in fatty acid synthesis, nonribosomal peptide synthesis, polyketide synthesis, and lysine biosynthesis (Ehmann et al. 1999). Ample biochemical and structural evidence suggests that the β-mercaptoethylamine group of CoA is not located in the binding pocket of PPTase and is not responsible for the transfer of Ppant to carrier proteins (Gehring et al. 1997; Parris et al. 2000; Reuter et al. 1999). Therefore, a number of functionalities, including haptens, fluorophores, affinity labels, and QDs have been introduced to the substrate CoA through a maleimide thiol reaction (George et al. 2004; Johnsson et al. 2005). AcpS and Sfp have been employed for multi-color labeling to tag selectively two different proteins (Vivero-Pol et al. 2005). ACP and PCP domains are 80–100 residues in size. An 11-residue DSLEFIASKLA sequence (ybbR) has been found to be sufficient to be modified by Sfp, which dramatically reduces the tag size (Yin et al. 2005). Zhou et al. (2007) have performed phage display screening and identified two 12-residue clones, namely S6 and A1, that can be selectively recognized by Sfp and AcpS, respectively. S6 and A1 are orthogonal to each other, permitting sequential labeling of two proteins with little cross-reactivity. The A1 peptide has been used to label the membrane pool of Smoothened; this has enabled the study of its trafficking pattern in response to hedgehog (Wang et al. 2009). Furthermore, based on nuclear magnetic resonance (NMR) studies of the A1 peptide, an 8-residue peptide has been generated that undergoes efficient Ppant conjugation with AcpS (Zhou et al. 2008).

Other enzyme-mediated chemical tags include the transglutaminase tag (Lin and Ting 2006), the sortase tag (Popp et al. 2007), and the formylglycine-generating enzyme (FGE) tag (Wu et al. 2009). These labeling approaches have so far been only applicable to cell surface proteins and in vitro tagging.

Fluorogen-based chemical tags

The fact that these fluorescent-dye-labeled chemical tags require washing to remove unbound dye has led to the development of fluorogenic labeling agents, i.e., those that convert from a weakly fluorescent form to a highly fluorescent form upon binding to a genetically encoded protein or peptide domain. The very first fluorogenic chemical tag was based on the tetracysteine sequence (CCXXCC), the TC tag, reported by Tsien and colleagues in 1998 (Griffin et al. 1998). The TC tag is recognized by a membrane-permeable bis-arsenical ligand FlAsH (fluorescein arsenical hairpin binder) and its red-shifted analog ReAsH (resorufin arsenical hairpin binder). FlAsH and ReAsH are quenched when complexed with the ligand ethanedithiol but exhibit strong fluorescence and high affinity upon binding to the TC tag (Luedtke et al. 2007). The original design of the TC sequence featured the four cysteines placed on one side of the α-helix, assuming that cysteines at i, i + 1, i + 4 and i + 5 positions would accommodate bis-arsenical binding (Griffin et al. 1998). However, substitution of the two central amino acids to Pro-Gly was later found to lead to an increase in quantum yield, association rate, and affinity (apparent KD = 4pM; Adams et al. 2002). This result was surprising because Pro-Gly is found in β-turns, rather than α-helices, suggesting that the preferred conformation for the tetracysteine motif is a hairpin turn, not an α-helix. Notably, the reversal of Gly and Pro resulted in the formation of a complex with decreased brightness and stability (Adams et al. 2002). Because FlAsH and ReAsH bind to intracellular proteins with cysteine pairs nonspecifically, the sample has to be destained with dithiols such as EDT or dimercaptopropanol (BAL), thereby limiting the use of the TC tag in animals (Hoffmann et al. 2010). In order to decrease the nonspecific background and increase the detection sensitivity for proteins expressed at low levels, a retroviral library containing various sequences flanking the TC motif was constructed (Martin et al. 2005). The library was then expressed in mammalian cells and a high concentration of dithiol was applied to screen for clones exhibiting improved dithiol resistance by FACS. Two sequences, namely FLNCCPGCCMEP and HRWCCPGCCKTF, were identified that showed higher signal to noise ratio when directly fused to β-actin under stringent dithiol wash conditions (Martin et al. 2005). NMR studies focusing on FLNCCPGCCMEP and ReAsH revealed that the central CPGC backbone formed a structure similar to a β-turn type II, and one arsenic bound to i and i + 1 thiols and another to i + 4 and i + 5 thiols (Madani et al. 2009). The structural requirements for arsenic display and fluorescence have mainly limited the TC ligands to FlAsH and ReAsH. For example, several rhodamine analogs of FlAsH were found to be non-fluorescent, even in the presence of TC peptide (Adams et al. 2002). Bhunia and Miller (2007) designed a general platform that separated the bis-arsenical targeting moiety from the fluorescent label, permitting the synthesis of a wide variety of SplAsH dyes (spirolactam arsenical hairpin binder). Unlike FlAsH and ReAsH, free SplAsH dyes are still fluorescent. Ying and Branchaud (2011) later adopted this strategy and synthesized SplAsH-biotin for affinity purification of TC-tagged proteins.

Because the thiol group on cysteine is required to react with FlAsH and ReAsH, TC labeling has been mainly used for intracellular proteins in a reducing environment. To label proteins in the endoplasmic reticulum (ER), in the Golgi, and on the cell surface, strong reducing reagents such as phosphines need to be added to cells acutely (Hoffmann et al. 2010). The TC tag has been widely used for protein labeling, including examples in which protein function is less disrupted by the small TC tag than by fluorescent proteins (Andresen et al. 2004; Das et al. 2009; Dyachok et al. 2006; Enninga et al. 2005). In addition, the TC tag has been used to address a number of questions in cell biology. For example, FlAsH and ReAsH have been used in pulse-chase experiments to differentiate young and old connexin43 and its assembly pattern in gap junctions (Gaietta et al. 2002). FlAsH and ReAsH have also been paired with fluorescent proteins, for example, cyan fluorescent protein (CFP) and GFP, respectively, to study conformation changes of individual proteins and protein-protein interactions (Bhattacharya et al. 2012; Hoffmann et al. 2005; Inobe et al. 2011; Roberti et al. 2007). FlAsH and ReAsH have also been paired with each other for FRET experiments by exploiting the different binding affinities between two TC motifs to label simultaneously two proteins in the cell (Zurn et al. 2010). Further, FlAsH and ReAsH have been used as efficient photosensitizers to inactivate TC-tagged proteins in CALI experiments (Kasprowicz et al. 2008; Marek and Davis 2002; Tour et al. 2003). Moreover, ReAsH shows a high output of singlet oxygen and has been successfully used to photoconvert diaminobenzidine tetrahydrochloride (DAB) in correlated light and electron microscopy (Gaietta et al. 2002; Gaietta et al. 2006; Lichtenstein et al. 2009). In a recent study, two FlAsH molecules were linked together to create xCrAsH (a dimeric biarsenical derivative of carboxyfluorescein) for cross-linking protein pairs and monitoring protein-protein interactions (Rutkowska et al. 2011).

The finding that any mutations at the two central positions of the TC sequence cause a decrease in binding affinity and brightness suggests that PG can be replaced with a large amino acid sequence between the two cysteine pairs such that the conformation of the cysteine pairs in three dimensions (3D) can recapitulate the required orientation and geometry in the linear TC motif for fluorescent labeling. This hypothesis has led to the development of bipartite TC display by Schepartz and colleagues (Luedtke et al. 2007). The two C-C pairs can be split and placed in separate regions of one protein to report conformational change or on two different proteins to report protein-protein interactions. In a proof-of-principle study, Luedtke et al. (2007) showed that model proteins containing the bipartite TC motif formed stable complexes with FlAsH and ReAsH and exhibited strong fluorescence, but only when these proteins were folded properly. Proteins with single point mutations were misfolded or misassembled and lost both affinity and brightness. To test the feasibility of bipartite TC display in non-alpha helical structures, C-C motifs were introduced into loops of p53, and C-X-C motifs were introduced into adjacent beta-strands of EmGFP (Goodman et al. 2009). Limited binding with ReAsH was observed with EmGFP variants, whereas p53 variants favored ReAsH binding significantly, suggesting that flexible regions, such as loops that are close in 3D, are good binding sites for bipartite TC display. The cysteine split display strategy has been employed in a number of applications: for example, to report on the folding state of retinoic-acid binding protein (CRABP; Krishnan and Gierasch 2008), to sense the Src kinase phosphorylation (Ray-Saha and Schepartz 2010), to detect amyloid-β aggregation and fibril formation in Alzheimer’s disease (Lee et al. 2011), to quantify protein dimerization on the plasma membrane (Pace et al. 2011), and to study the conformational change of epidermal growth factor receptor (EGFR) in response to various ligands (Scheck et al. 2012). Similar to the linear TC tagging, high resolution electron microscopy can be applied to bipartite TC display on interacting proteins (Dexter and Schepartz 2010)

In analogy to the arsenic-TC interaction, another short peptide tag was developed utilizing the bis-boronic acid rhodamine dye (RhoBo) and the tetraserine sequence SSPGSS (Halo et al. 2009). Although originally designed as a monosaccharide sensor (Kim et al. 2003), RhoBo has an affinity to the tetraserine sequence at least 105 higher than a number of monosaccharides and shows limited fluorescence on the cell surface, a saccharide-rich environment (Halo et al. 2009). Since RhoBo does not use the toxic element arsenic and is inert to redox conditions, it is an attractive fluorogenic probe for protein labeling. However, because  more than 100 human proteins contain the SSPGSS motif, further work is required to identify optimized tetraserine motifs for highly specific, targeted intracellular labeling.

Szent-Gyorgyi et al. (2008) have developed a new dye-protein system based on the fluorogen concept. Fluorogen-activating proteins (FAPs; ~25 kDa) are based on specific molecular recognition and activation between a single-chain antibody fragment and small, otherwise non-fluorescent, dye molecules such as thiazole orange (TO) and malachite green (MG; Szent-Gyorgyi et al. 2008). TO and MG are known fluorogens that interact with DNA (Nygren et al. 1998) or RNA aptamers (Babendure et al. 2003). When free in solution, these fluorogens have extremely low quantum yields, because of the free rotation of aromatic moieties within the chromophore leading to excited state de-activation. Upon binding to cognate FAPs, the fluorogens are constrained in the binding pocket of FAP, thus reducing these excited state rotations and stabilizing the fluorescent conformation, resulting in many thousand-fold increases in fluorescence emission.

The FAPs were initially isolated by screening a yeast cell surface display library containing ~109 synthetically recombined scFv clones by FACS for fluorescence resulting from protein-dye interactions. Clones displaying high affinity and high quantum yield against cognate dyes were enriched through multiple rounds of sorting. Several FAP-dye pairs were characterized, with some showing single-digit nanomolar affinity from both purified and yeast cell-surface-displayed FAP. Remarkably, no cross-reactivity was observed between TO-1 and MG-2p FAPs, thus permitting multi-color labeling and imaging. Two classes of MG-based fluorogens were synthesized: cell surface receptors tagged with FAP are readily detected with an impermeable dye MG-2p or MG-11p, whereas proteins within the secretory compartments in the cell can be stained with a membrane-permeable dye MG-ester. Recently, another MG-based cell-impermeable fluorogen MG-B-Tau was designed to improve cell exclusion and nonspecific binding and showed fast binding kinetics on the cell surface (k on =5 × 105 M−1 s−1; Yan et al. 2014a). TO and MG fluorogens are well suited for flow cytometry and microscopy assays with commonly used laser lines: 488- or 514-nm excitation for TO and 633-nm for MG. To expand the fluorescence emission range with FAPs, Ozhalici-Unal et al. (2008) took advantage of the yeast cell surface display library and isolated FAP clones for dimethylindole red (DIR), another unsymmetrical cyanine dye. One of the DIR clones, K7, showed low-binding affinity (14 nM) and relatively high quantum yield (0.33). Surprisingly, in addition to binding to the bulky fluorogen DIR, K7 can also accommodate a few smaller TO-1-related fluorogens, suggesting considerable plasticity of the binding site. This promiscuity of K7 allows for fluorescence readout from 450–750 nm by using one single FAP, spanning most of the visible and near infrared spectrum. Further efforts have been made to select FAPs that can activate an oxazole-thiazole-blue-derived blue-fluorescing cyanine dye (Zanotti et al. 2011). Moreover, DIR has been modified with an electron-withdrawing cyano group and shows significantly improved photostability while retaining a high-binding affinity for the FAP (Shank et al. 2009). Saunders et al. (2014) have taken advantage of the finding that MG fluorogens emit around 670 nm, a region of little cellular autofluorescence, and have developed a bifunctional fusion protein composed of a fluorescein-binding and -quenching scFv and a high-affinity MG-binding FAP. When bound to fluorescein-conjugated antibodies, this reagent effectively quenches fluorescein and converts the conjugate to a far-red excitation/emission probe, replacing the fluorescein with a high-brightness high-stability MG-FAP complex. Because fluorescein-conjugated antibodies are among the most commonly used, this reagent offers a straightforward and convenient method for reducing cellular background and for improving the signal to noise ratio for both live- and fixed-cell labeling.

The non-covalent interaction between the fluorogen and FAP is similar to that of a ligand and its receptor. Therefore, the fraction of the FAP-fluorogen complex can be tuned by adjusting the fluorogen concentration. Fractional occupancy increases with increased fluorogen concentration and can be modeled as a hyperbolic binding function. Low fractional occupancy or low-density labeling can then be readily achieved with controllable low concentrations of the free fluorogen. Based on this principle, FAP-based localization super-resolution microscopy has been developed for both fixed- and live-cell imaging (Yan et al. 2014b). This system is advantageous over traditional approaches (STORM, Rust et al. 2006; photo-activated localization microscopy [PALM], Betzig et al. 2006), because it does not require special imaging buffers or photoactivation or photoswitching with a second laser line. Similarly, Schwartz et al. (2014) demonstrated the use of the FAP-fluorogen system for single-particle tracking experiments. FAP was fused to the γ subunit of the high affinity IgE receptor (FcεRI; Schwartz et al. 2014). This labeling scheme, for the first time, enabled the tracking of FcεRI independently of a separately labeled IgE molecule and provided new insights into FcεRI signaling and dynamics, both when free and when bound to an activating (cytokinergic) IgE molecule. All this work demonstrates that the FAP-based tunable labeling system is a viable tool for studying the biology of many receptor proteins at the single-molecule level.

The cell-impermeable fluorogens have enabled selective labeling of receptor proteins, including the β2 adrenergic receptor (β2-AR), the insulin-regulated glucose transporter (GLUT4), and the cystic fibrosis transmembrane conductance regulator (CFTR), and quantitative analysis of β2-AR endocytosis by flow cytometry and microscopy (Fisher et al. 2010; Holleran et al. 2010). On the other hand, intracellular protein labeling with FAP is less straightforward. This is because antibodies are naturally secreted into the extracellular space or are membrane-bound and do not fold efficiently in the cytoplasm (Farinas and Verkman 1999). Nonetheless, an engineered FAP clone H6 that does not contain any disulfide bonds has been successfully expressed in the cytoplasm by making an N-terminal fusion to β-actin and used in STED nanoscopy (Fitzpatrick et al. 2009). Recently, crystallographic studies have revealed that two L5*VL domains form homodimers when complexed with MG (Szent-Gyorgyi et al. 2013). This has led to the construction of dL5**, a covalent tandem dimer of L5 with point mutations (E52D L91S) to confer tighter binding and increased quantum yield (Szent-Gyorgyi et al. 2013). dL5** has been successfully used to target proteins both on the plasma membrane and inside the cells (Saurabh et al. 2014; Wang et al. 2014; Yan et al. 2014a).

The FAP-fluorogen complex forms a modular probe in two ways. First, the same FAP clone can be paired with different fluorogens or fluorogen derivatives for different biological applications, such as pH measurement during protein trafficking (Grover et al. 2012) and fluorescence detection of low abundance proteins on the cell surface (Szent-Gyorgyi et al. 2010). Second, the FAP-fluorogen complex is generally transferrable, from one protein of interest to another, when fused to protein affinity reagents, such as affibodies. Wang et al. (2014) have expressed FAP fused to an EGFR Affibody ZEGFR:1907 as one single recombinant protein. In this case, the affibody serves as an affinity probe with high specificity for EGFR, whereas FAP provides the signal readout when bound to various fluorogens. A new MG-based signal-amplifying dendrimer Hexa-Cy3-MG has been used to label the endogenous EGFR on A431 cells, leading to significantly improved detection sensitivity (Wang et al. 2014). By altering the order of adding MG-B-Tau (Yan et al. 2014a), an improved cell impermeable dye, and EGF, subpopulations of EGFR undergoing endocytosis can be tracked and quantified (Wang et al. 2014).

The highly specific interaction between FAP and fluorogen suggests that fluorogens can serve as an affinity handle to incorporate other functional groups into the FAP-fluorogen complex and to provide targeted labeling and fluorescence readout. This “localization” concept has previously been experimentally demonstrated; for example, the calcium sensor Indo-1, Fura -2, or BOCA-1 has been linked to O6-benzylguanine (BG), the substrate for the SNAP tag, in order to report the local calcium changes in the cell (Bannwarth et al. 2009; Kamiya and Johnsson 2010; Ruggiu et al. 2010). In this case, BG acts as a structural moiety to bring the fluorophore/functional group to the tagged protein and is not fluorescent. In addition to targeting functional groups to cells, such “hybrid” systems have been used to target QDs to cells. QDs are semiconductor nanocrystals (Bruchez et al. 1998). Because of their superior brightness and photostability, QDs are well suited for single-molecule long-term tracking experiments. However, QDs are generally targeted to proteins via antibodies; this is limited by the availability and low affinity of many antibodies. Saurabh et al. (2014) have developed a FAP/MG-biotin-based approach to target streptavidin-QD conjugates to both membrane and intracellular proteins through genetical fusion with FAP. Previously, QDs have been targeted to proteins by using BirA (Howarth et al. 2005) and LplA (Liu et al. 2012). However, for both approaches, the enzymes need to be provided exogenously, either by expression and purification and then addition to the cells or by coexpression in the ER (BirA-ER) to biotinylate AP in the secretory compartments. In addition, LplA-mediated QD targeting is achieved by using a two-step protocol to conjugate sequentially 10-bromodecanoic acid and then HaloTag-derivatized QD, which is not commercially available.

On the other hand, in the scenario of fluorogen-based biosensors, when an environmentally sensitive probe is conjugated to the fluorogen, a FRET pair is formed once the fluorogen binds to FAP and becomes activated. This rational design has led to the development of fluorogen-based tandem dyes as a platform for a wide variety of biosensors.

One example of such a tandem dye is TO1-Cypher5. Cypher5 is a water-soluble pH-sensitive cyanine dye. With a pKa of 6.1, Cypher 5 is only fluorescent under acidic conditions. Traditionally, Cypher5 has been conjugated to antibodies to study receptor internalization and the pH change in endocytic compartments (Adie et al. 2002; Nordberg et al. 2007; Wenzel et al. 2012). However, antibody conjugation is not practical and readily available. Selectively to label receptors present on the cell surface, Grover et al. (2012) have utilized TO1-Cypher5 to label FAP-tagged β2−AR. Because of the spectral overlap between TO-1 and Cypher5, intramolecular energy transfer takes place from the donor TO1 to the acceptor Cypher5. From neutral pH to acidic pH, the extinction coefficient of Cypher5 increases eight-fold, leading to a significant change in the ratio of Cypher5 emission over TO1 emission. This change therefore correlates with a well-defined ratiometric signature against the pH. As the receptor internalizes from the plasma membrane (pH 7.4) to endosomes (pH 5-6), pH changes are easily detected with TO1-Cypher5 (Fig. 2).

Fig. 2
figure 2

Structure and spectral properties of TO1-CypHer5 (Grover et al. 2012). a Structure of TO1-CypHer5. TO1 is linked to pH-dependent CypHer5, forming a FRET pair. The pH sensitivity is conferred by the protonation of the free indole nitrogen atom. b Emission spectra of TO1-CypHer5 when bound to FAP at various pH, normalized to the TO1 emission peak. Excitation wavelength: 488 nm. c–c’ Color change of TO1-CypHer5 during FAP-tagged β2-AR endocytosis. Upon addition of the agonist, isoproterenol (ISO; 10 μM) to the cells, FAP-β2-AR undergoes endocytosis, leading to a significant increase of red emission from the acidic vesicles, while the plasma membrane (pH 7.4) still shows exclusively green emission. Bars 10 μm

Another application of the tandem dye is the development of fluorogenic dyedrons (Szent-Gyorgyi et al. 2010). Dyedrons are comprised of multiple Cy3 donors linked to a single MG acceptor. When free in aqueous solution, although efficient energy transfer occurs from Cy3 to MG, the energy is dissipated nonradiatively because of the free rotation of the MG chromophore, which acts as a quencher. Upon binding to a cognate FAP, MG is held rigidly in the binding site of FAP, leading to fluorescence emission, with enhanced excitation at the donor wavelength proportional to the number of donor molecules attached. The properties of dyedrons have been characterized with L5, a FAP clone affinity matured to show increased binding affinity and quantum yield against MG. Tight binding is again observed with dyedrons, and the highest Kd is 15 nM, suggesting that the addition of multiple Cy3 molecules does not abolish the binding between MG and FAP, although it is significantly reduced relative to free MG binding (<1 nM). The extinction coefficient of dyedrons increases in proportion to the number of Cy3 molecules, with the highest being measured at 530,000 M−1 cm−1, about 10 times higher than a typical fluorescent protein (Newman et al. 2011). Like MG, free dyedrons in solution are essentially dark (quantum yield: <0.0005). The intrinsic high brightness of dyedrons coupled with the extremely low background makes them an ideal tool for detecting low-copy-number proteins with a high signal to noise ratio. Currently, dyedrons are limited to tagging cell surface proteins because of their negative charges.

Similarly, MG has been linked to tetramethylrhodamine (TMR) to give TMR-MG (Yushchenko et al. 2012). Like TO1-Cypher5 and dyedrons, TMR-MG was designed for intramolecular energy transfer. Surprisingly, instead of being quenched, one of the isomers, namely TMR-para-MG, showed pronounced fluorescence at TMR emission in the absence of FAP. This property of TMR-para-MG was discovered to be caused by the pH dependence of MG. At pH 7.4, 65 % of MG-NH2 is in the dye form, whereas the rest is in the colorless carbinol form. As a result, TMR can only be quenched to ~70 % by the dye form of MG through FRET, and the unquenched TMR (30 %) still remains fluorescent. Similar to other cationic dyes, TMR-para-MG localizes in mitochondria (pH 8.0) after crossing the cell membrane and can be easily detected by TMR emission. This unique color-switching behavior of TMR-para-MG provides a distinct advantage over other tandem dyes: TMR-para-MG labels FAP-tagged structures with MG emission, and cells not expressing FAP can be visualized by mitochondrial staining in the TMR channel. Moreover, TMR-para-MG has been demonstrated to be applicable to two-photon imaging because of the high two-photon absorption cross section of TMR.

Clearly, the FAP-fluorogen system possesses a few distinct advantages for a range of biological applications. As a genetically encoded tag, FAP allows for exclusive cell surface labeling with an impermeable fluorogen. FAP labeling has also been extended to the interior of the cell by employing a disulfide-bond-free FAP and a cell-permeable fluorogen. These labeling schemes provide unique spatial control by chemically separating the cell surface target from the intracellular pool. In addition, because the fluorescence signal is turned on only after the addition of the fluorogen, temporal control is readily achieved, and this system is well suited for pulse-chase and order of addition experiments that can be designed to detect various components of the cellular trafficking pool. Free fluorogens exhibit an extremely low background in a number of physiological buffers. Therefore, no washing steps are needed before detection. In this two component system, both the FAP and the fluorogen can be tuned in order to obtain the desired properties. For example, fluorogens can be designed to display improved spectral properties, brightness, and photostability. On the other hand, through random mutagenesis and directed evolution, FAPs can be selected to accommodate a large repertoire of fluorogens. With further efforts, we expect that the FAP-fluorogen system will be useful for addressing exciting unexplored biological questions.

Concluding remarks

Since the development of the first chemical tag by Tsien and colleagues in 1998 (Griffin et al. 1998), chemical tagging has become an invaluable tool for cell biologists. Although the chemical tagging approaches described here have provided a breakthrough for studying the activity and dynamics of proteins in their native environment, each of the methods still has some shortcomings, and the choice of the labeling technique should match the biological question to be answered. Ideally, the chemical tag and labeling approach should meet the following criteria: the tag size should be small enough such that the tag does not perturb the structure and function of the protein of interest; the fluorophore conjugates and the byproduct from the labeling reaction should not exhibit any cytotoxicity and should be biocompatible; the fluorophore conjugates should be inert to the oxidative/reducing environment and be able to function both on the cell surface and in the cytoplasm; the labeling reaction should be fast, highly specific, and orthogonal to the mammalian system; finally, the fluorophores should have a high photon output and generate minimum background to ensure a high signal to noise ratio. With continuous efforts aimed at developing novel chemical labeling strategies, we foresee that chemical tagging will be widely adopted for the study of sophisticated biological questions and will greatly advance our understanding of human disease and normal cell function.