Introduction

Epithelial cell adhesion molecule, so-called EpCAM, is a transmembrane glycoprotein expressed exclusively in epithelia and epithelial-derived neoplasms (Armstrong and Eck 2003). Extensive studies on EpCAM have led to the knowledge of its roles in not only the original function of cell adhesion (Ladwein et al. 2005; Litvinov et al. 1997; Nochi et al. 2004; Trzpis et al. 2007) but also signaling (Guillemot et al. 2001; Maetzel et al. 2009; Munz et al. 2004), cell migration (Guillemot et al. 2001; Osta et al. 2004), proliferation (Munz et al. 2004; Osta et al. 2004), and differentiation (Cirulli et al. 1998; Osta et al. 2004). Most importantly, immunohistochemical studies revealed that EpCAM is overexpressed on various carcinoma cells isolated from patients with breast, prostate, ovarian, lung, colon, renal, and gastric cancer (Baeuerle and Gires 2007; Patriarca et al. 2012; Spizzo et al. 2004, 2006; Went et al. 2005, 2006), implying the importance of EpCAM as a diagnostic biomarker for various cancers. Moreover, EpCAM is recently recognized as a critical biomarker of cancer stem cells, which makes it more exciting to develop specific probes for not only in vitro/vivo imaging but also in drug-conjugate carriers.

Specific probes to cancer biomarkers are often made of monoclonal antibodies (mAbs) (Chaudry et al. 2007; Deonarain et al. 2009; Kwiatkowska-Borowczyk et al. 2015; Linke et al. 2010). Specific mAbs against EpCAM have been generated and even commercially available from various venders for laboratory use. To make fluorescently active probes, the mAb can be directly tagged with a fluorescent molecule, such as fluorescein or rhodamine, often involving random modification(s) to the exterior lysine residues available on the mAb by the chemistry of amide bond formation. Alternatively, partial or full reduction of disulfide bond(s) liberates thiol groups, which can be covalently modified with the thiol-selective fluorescent molecules using maleimide chemistry, for instance (Kratz et al. 2012; Shaunak et al. 2006). Despite the high specificity and affinity of mAb to EpCAM in the complex of 1:1 or 1:2 stoichiometry, a drawback of the above method is that neither of these methods produces a homogeneously modified probe. Instead, a secondary mAb conjugated with a fluorescent molecule is often used for non-labeled anti-EpCAM mAbs. Because of convenience of this method due to a single kind of the secondary mAbs against a constant region of mAbs is applicable to a variety of different mAbs against antigens of interest, this method is more generally utilized particularly in vitro experiments. However, for the application of human in vivo imaging or drug-conjugate, the former method involving a humanized mAb directly modified with such molecules is the only choice.

More recently, probes based on non-full-length antibodies or alternative molecules have been developed. Single chain antibody fragment (scFv) often generated by phage-display is a typical example of the former alternative (Hristodorov et al. 2014; Huls et al. 1999; Hussain et al. 2006). Other protein scaffolds such as designed ankyrin repeat proteins (DARPins) (Martin-Killias et al. 2011; Stefan et al. 2011) or antigen-specific shark vNAR domains (Zielonka et al. 2014) have been also used to generate EpCAM-specific and high affinity molecules. More distinct alternatives are nucleotide aptamers, which have been successfully generated by the in vitro selection (or SELEX) method (Jung et al. 2014; Song et al. 2013; Subramanian et al. 2014). All these molecules can be conjugated with an appropriate fluorescent tag for the use of tumor imaging. Although these alternatives have smaller molecular sizes than mAb, they are yet in a size of over 10,000 Da. Thus, it still remains a challenge to obtain specific probes for EpCAM with less than 10,000 Da, which gives us better synthetic accessibility as well as chemical amenability.

Here, we report that macrocyclic peptides strongly bind to the extracellular domain of EpCAM (ex-EpCAM) with a dissociation constant in the low nM range (1.7 nM), and a fluorescently labeled probe derived from one of the macrocyclic peptides is able to specifically stain MCF7 cells expressing EpCAM. This 14-mer thioether-linked macrocyclic peptide containing a single d-tryptophan was discovered by means of the random non-standard peptide-integrated discovery (RaPID) system (Morioka et al. 2015; Passioura et al. 2014; Yamagishi et al. 2011). The probe derived from this peptide consists of a triple-repeat of glycine-serine linker followed by a lysine residue of which ε-amino group in the sidechain was tagged with fluorescein, of which the molecular mass was less than 3000 Da. Notably, this small probe was able to visualize nearly every live cell under high cell-density conditions, which was not achieved by the conventional mAb staining method. This suggests that the molecular probe based on the compact macrocyclic peptide scaffold has great potentials as an imaging tool for the EpCAM biomarker as well as a delivery vehicle for drug conjugates.

Results and Discussion

The RaPID Selection of Anti-EpCAM Macrocyclic Peptides

To discover macrocyclic peptide ligands against ex-EpCAM, we have utilized the RaPID system (Morioka et al. 2015; Passioura et al. 2014) that enables us to ribosomally express a macrocyclic peptide library from the corresponding mRNA library under the reprogrammed genetic code; and then the individual peptides are fused to the cognate mRNA via a puromycin (Pu) linker (Nemoto et al. 1997; Roberts and Szostak 1997) which is ligated to the 3′-end of mRNA by the catalysis of ribosome (Fig. 1). Since the details of the technology have been discussed elsewhere (Hayashi et al. 2012; Ito et al. 2015; Kodan et al. 2014; Morimoto et al. 2012; Tanaka et al. 2013; Yamagishi et al. 2011), we here focused on describing the design of the library used in this study. The initiator codon (AUG) was reprogrammed to encode N-ClAc-d-tryptophan (ClAc-DW), where ClAc-DW-\({\text{tRNA}}^{\text{fMet}}_{\text{CAU}}\) prepared by a flexizyme (eFx) was added to the Met/RF1-deficient translation (FIT, flexible in vitro translation) system (Goto et al. 2011). Following the initiator ClAc-DW, elongator amino acid sequences were encoded by (NNK) n (K = U or G, n = 4–12) on the mRNA library. Subsequently, a cysteine residue was encoded by UGC, (GS)3 linker encoded by (GGCAGC)3, followed by a terminator encoded by UAG that would not result in release of the peptide chain due to the lack of RF1; instead, the Pu linker, which was annealed and ligated to the constant region of the 3′-end region of the individual mRNAs, would be efficiently fused with the C-terminus of the (GS)3 linker via an amide bond. The thiol sidechain in a Cys residue encoded by UGU appeared in the random region (except for the 2nd position, see Iwasaki et al. 2012) or the designated downstream linker region would selectively react with the N-terminal ClAc group (Goto et al. 2008; Iwasaki et al. 2012). This intramolecular SN2 substitution displayed the thioether-macrocyclic peptides on the cognate mRNAs, which would allow us to perform in vitro selection of active species for binding to the ex-EpCAM immobilized on magnetic beads.

Fig. 1
figure 1

Overview of the RaPID system for the in vitro selection of macrocyclic peptides that bind to the extracellular domain of EpCAM. Messenger RNA libraries containing random sequence domain, (NNK)4–12, were transcribed from the corresponding cDNA library and conjugated with an oligonucleotide bearing a 3′-puromycin residue. The resulting mRNA library was translated by FIT system in the presence of initiator tRNA charged with chloroacetyl-d-tryptophan (ClAc-DW). Linear translation products displayed on the individual mRNAs were spontaneously cyclized after translation to give the mRNA-displayed macrocyclic peptide library. The library was then subjected to Fc-tagged extracellular domain of EpCAM (ex-EpCAM) immobilized on protein G magnetic beads and the binding fraction was isolated. Reverse transcription was performed after the selection at the 1st round and before the selection from at the 2nd and following rounds. The cDNAs on active mRNA-peptide fusion were recovered and amplified by PCR

We thus performed selection against the ex-EpCAM according to the standard procedure (Hayashi et al. 2012; Yamagishi et al. 2011). At the 4th round of selection, the recovery of binding species significantly increased over the background (i.e., species binding to magnetic beads), and we have decided to clone the enriched peptide species after 5 rounds of selection and carry out sequencing of 19 clones (Fig. 2a). Sequence alignment of the 19 peptides revealed three classes of macrocyclic peptides, Class I–III (Supplementary Fig. 1). Among them, Class I peptide (Epi-1) with a 14-mer macrocyclic structure was the most abundant single species (5/19), where the Cys residue used for thioether macrocyclization came from the designated UGC Cys codon. The third abundant single species (3/19), Class II peptide (Epi-2), has a small macrocycle with a linear tail peptide. It should be noted that both Class I and II peptides were expressed “in-frame” from the respective mRNA templates, which were apparent from the triplet repeat of the GS linker (Supplementary Fig. 1).

Fig. 2
figure 2

In vitro selection of macrocyclic peptides that bind to ex-EpCAM. a Progress of the selection. Recovery rates of cDNA at each round were calculated from the initial and recovered amounts of cDNAs determined by quantitative PCR. The recovery rates in the positive selection against ex-EpCAM-immobilizing protein G beads are shown in black while those in the negative selection against free protein G beads are shown in gray. b Two representative peptides identified from the cDNA pool after the 5th round and their linear analogs. See Supplementary Fig. 1 for the full list of identified peptide sequences. c Binding of mRNA-displayed peptides to ex-EpCAM immobilized onto magnetic beads. Each peptide was expressed and conjugated with its mRNA template in the FIT system. The resulting mRNA-displayed peptides were subjected to free magnetic beads (gray) or ex-EpCAM-immobilized beads (black), and recovery rates of cDNA were calculated from the initial and recovered amounts of cDNAs determined by quantitative PCR

In the Class IIIa–c peptides (Epi-3–9), the Cys residue that formed thioether bond to the N-terminus was flanked by two consensus regions (Supplementary Fig. 1). The Class IIIa peptides, including the second abundant single species of Epi-3, have a consensus sequence of LGLI, while the Class IIIb and IIIc peptides have a single mutation from G to H and L to H, respectively. The C-terminal sequence appeared following the Cys residue contained seven consecutive alanine residues followed by RTGGG, which are fully conserved in the Class IIIa–c peptides. This conserved sequence was originated from a “frame-shift” occurred in the random sequence region likely caused by deletion of a base in the mRNA library, i.e., one-nucleotide deletion in the random region resulted in the frame-shift of the originally designed UGC-(GGC-AGC)3 region encoding a single Cys residue and (GS)3 linker to GCG-(GCA-GCG)3 yielding seven consecutive alanine residues.

Characterization of the Macrocyclic Peptides in the Display Format

To verify the binding activities of the selected peptides, Epi-1 and Epi-3 were chosen for further studies (Fig. 2b). We first performed binding assay using the display format, where the recovery of Epi-1-mRNA or Epi-3-mRNA to the ex-EpCAM-immobilized beads versus the free beads was determined by real-time PCR (Fig. 2c). Moreover, to verify importance of the macrocyclic scaffold, we prepared their liner versions, LinEpi-1-mRNA and LinEpi-3-mRNA (Fig. 2b), to monitor the recovery rates (Fig. 2c). The data showed that both Epi-1-mRNA and Epi-3-mRNA had three orders of magnitude higher recovery rates over the background, indicating that these macrocyclic peptides bound ex-EpCAM. Linearization in both peptides resulted in complete loss of their binding ability, indicating that the macrocyclic scaffold is critical in both peptides.

Since Epi-3 has a tail peptide of hepta-alanine followed by RTGGG, we wondered how important these residues are for the binding. We thus prepared four constructs of deletion mutants of Epi-3 (Supplementary Fig. 2), in which the tail peptide was completely replaced with a (GS)3-linker peptide (Epi-3-GS), partly with the GS-linker peptide (Epi-3-A1 and A3), and with hexa-alanine only as the tail peptide (Epi-3-A6). To our surprise, only Epi-3-A6 was able to bind to ex-EpCAM with a slight loss of wildtype activity, suggesting that the poly-A tail peptide plays some roles in the binding.

Synthesis of a Fluorescent Probe for Cellular EpCAM

Prior to the synthesis of a fluorescent probe based on the macrocyclic peptides, we chemically synthesized the Epi-1 and Epi-3 with C-terminal carboxyamide and determined their kinetic constants to ex-EpCAM on an SPR sensor chip (Supplementary Fig. 3). The association rate constant (k a) and the dissociation rate constant (k d) of Epi-1 were 5.1 × 106 M−1 s−1 and 8.5 × 10−3 s−1, respectively. These values have led to the dissociation constant (K D) of 1.7 nM (Table 1). Similar values were also observed for Epi-3 (Table 1). These results indicated that both macrocyclic peptides have remarkable binding ability to ex-EpCAM, as expected from the qualitative results using the display format. However, we concerned that the hydrophobicity of the hepta-alanine tail peptide might cause non-specific interactions with other transmembrane proteins on cells. Therefore, we decided to focus on the Epi-1 macrocyclic peptide for the development of fluorescent probe for cellular EpCAM.

Table 1 Kinetic and binding constants of Epi-1 and Epi-3

We designed two fluorescent probes based on the Epi-1 peptide sequence. One probe was the macrocyclic Epi-1, and its C-terminus was modified with a lysine-carboxyamide residue where the ε-amino group on the sidechain was tagged with fluorescein (Epi-1-F, Fig. 3a). The other probe was the liner Epi-1 with the same fluorescein modification (LinEpi-1-F, Fig. 3a). These molecules were readily synthesized and purified by means of the standard solid-phase chemical synthesis and preparative HPLC.

Fig. 3
figure 3

Imaging of EpCAM on the living cells by fluorescent-labeled Epi-1. a Sequences of fluorescent-labeled Epi-1 (Epi-1-F) and its linear analog (LinEpi-1-F). Fluorescein was conjugated onto the sidechain of the C-terminal lysine of each peptide. bd Laser confocal images of cells treated with the peptides and anti-EpCAM antibody. MCF7 cells or HuO-3N1 cells were treated with fluorescein-labeled peptides and stained with an anti-EpCAM antibody and an anti-IgG antibody conjugated with Alexa Fluor 633

Fluorescent Imaging of EpCAM-Expressing Cells Using the Macrocyclic Peptide Probes

We first tested Epi-1-F to fluorescently stain EpCAM-expressing MCF7, breast cancer cells. Incubation of Epi-1-F with MCF7 cells for 5 min, followed by gentle wash with media, allowed us to clearly visualize the membrane region of individual live cells using a fluorescent confocal microscope (Fig. 3b). Simultaneously, the cells were also stained by a conventional immunostaining method using anti-EpCAM antibody, giving nearly the same staining pattern as the Epi-1-F staining. In contrast, LinEpi-1-F, which should lack of binding ability to EpCAM based on our previous study (Fig. 2c), was unable to fluorescently stain MCF7 cells at all, whereas the antibody control could do (Fig. 3c). This indicates that the binding ability of Epi-1 to EpCAM is critical for the staining of cells. As an additional negative control, we examined staining of the EpCAM-deficient cells, HuO-3N1, using Epi-1-F. Neither Epi-1-F nor antibody was able to stain HuO-3N1 cells, as expected, clearly showing the high specificity of Epi-1-F as an EpCAM probe.

During the course of the above study, we noticed that a certain area of cells could be stained only by Epi-1-F, not antibody (see the bottom part of the merged image of Fig. 3b). It turned out that this area of cells was denser than other areas of cells. We therefore wondered if Epi-1-F could fully stain MCF7 cells under high cell-density conditions. In fact, Epi-1-F was able to stain nearly every live cell not only the surface area of dense cells (Fig. 4a) but also the middle area of cell–cell interfaces (Fig. 4b). In contrast, the antibody could stain the surface area better than the middle area.

Fig. 4
figure 4

Laser confocal images of cell cultures under high cell-density conditions stained with Epi-1 and anti-EpCAM antibody. Cultures of MCF7 cells with a higher cell density were treated with Epi-1-F and an anti-EpCAM antibody and followed by an anti-IgG antibody conjugated with Alexa Fluor 633

Although the above qualitative analysis under the conditions where MCF7 cells are adherent on the plate has indicated that Epi-1-F effectively binds EpCAM on the cells, we have also preformed quantitative analysis of the peptide staining for suspended (floating) MCF7 cells by means of flow cytometry. The EpCAM-positive MCF7 cells and EpCAM-negative HLF cells were treated with trypsin independently, and then the respective cells were resuspended in Hank’s balanced salt (HBS) solution. These cells were then stained with both Epi-1-F and an allophycocyanin (APC, which has a 670 nm fluorescent emission profile, distinct from that of fluorescein at 520 nm) conjugated anti-EpCAM monoclonal antibody simultaneously and subjected to flow cytometry (Supplementary Fig. 4a, b). Epi-1-F effectively stained 81.7 % of MCF7 cells whereas it stained only 1.1 % of HLF cells. The antibody was able to stain 99.3 % of MCF7 cells compared with a negligible population of HLF cells. The results clearly indicate that despite its small size compared with the antibody, Epi-1-F can effectively distinguish the EpCAM-positive cells over the negative cells.

Conclusion

Here, we have reported the RaPID selection of thioether macrocyclic peptides that strongly bind to ex-EpCAM. The most abundant macrocyclic peptide among the clones identified in this study, referred to as Epi-1, exhibits remarkably high affinity to the EpCAM with K D of a single-digit nM. The fluorescent probe derived from Epi-1 with 2904 Da is able to specifically stain EpCAM-expressing cells. Even though Epi-1 was selected against the extracellular domain of EpCAM in vitro, it is capable of binding to the full-length cellular EpCAM in the highly specific manner, allowing us to utilize it as an excellent fluorescent imaging probe. Particularly, our study on the Epi-1-F staining of MCF7 under high cell-density conditions showed its greater advantage of effective staining of every cell interfaces than the conventional antibody staining. This result encourages us to develop not only an imaging probe for EpCAM-expressing cancer stem cells but also a drug-delivery vehicle with the format of a peptide-drug conjugate. Moreover, the RaPID system would enable us to discover potent macrocyclic peptide ligands against a wide array of tumor-associated transmembrane proteins, which opens a new opportunity for the next generation of small non-protein drugs.