Main

Despite significant advances in diagnosing and treating cancer, metastasis persists as a barrier to successful therapy and the main cause of cancer-related death1. The EMT, wherein epithelial cells depolarize, lose their cell–cell contacts, and gain an elongated, fibroblast-like morphology, is a potential mechanism by which tumour cells gain metastatic features. Functional implications of EMT include enhanced mobility, invasion and resistance to apoptotic stimuli2,3. Moreover, through EMT tumour cells acquire cancer stem cell, secondary tumour-initiating and chemoresistance properties4,5,6. However, the importance of EMT in vivo is fiercely debated owing to major challenges. Mesenchymal tumour cells cannot easily be distinguished from neighbouring stromal cells, and metastatic lesions mostly exhibit epithelial phenotypes7. The latter may be due to the hypothesized reverse process, mesenchymal to epithelial transition (MET), of the disseminated tumour cells. Studies have confirmed that mesenchymal cells are more capable of escaping the primary tumour, and of reaching distant sites, but it remains unproven that those same cells complete the full metastatic cascade in the form of a secondary nodule. Without evidence for the dissemination, colonization and metastatic outgrowth of mesenchymal tumour cells, the role of EMT will remain contested. In this study, we employed multiple transgenic mouse models, establishing a cell lineage tracing approach together with characterization of epithelial and mesenchymal markers, to address the requirement of EMT in metastasis. The newly established transgenic model also provided us a unique opportunity to study the contribution of EMT to chemoresistance.

EMT lineage tracing during metastasis

To track EMT during metastasis in vivo, we generated a mesenchymal-specific, Cre-mediated fluorescent marker switch strategy and established a triple-transgenic mouse model (MMTV-PyMT/Rosa26-RFP-GFP/Fsp1-cre, tri-PyMT, Fig. 1a). In these mice, spontaneous multifocal breast adenocarcinomas with distinct epithelial characteristics resembling the human luminal subtype develop in the mammary glands, and give rise to lung metastases with high penetrance8,9. The Fsp1 (fibroblast specific protein 1) promoter drives expression of Cre recombinase in cells of mesenchymal lineage10. A Cre-switchable fluorescent marker (lox-RFP-STOP-lox-GFP) is ubiquitously expressed under the control of the β-actin promoter in the Rosa26 locus11. Fsp1 is the critical gatekeeping gene of EMT initiation12, and its early activation in this process13 allows for lineage tracing of tumour cells that have undergone EMT in vivo. Importantly, the colour switch system is irreversible—even if the mesenchymal tumour cells undergo MET in the metastatic organs14, they would remain GFP+.

Figure 1: Establishing an EMT lineage tracing system in triple-transgenic mice.
figure 1

a, Schematic of triple-transgenic mice carrying polyoma middle-T (PyMT) or Neu oncogenes driven by the MMTV promoter, Cre recombinase under the control of the Fsp1 promoter, and floxed RFP-STOP followed by GFP under control of the β-actin promoter in the Rosa26 locus. RFP+ epithelial tumour cells undergoing EMT permanently convert into GFP+ cells following activation of Fsp1–Cre. b, c, Immunofluorescent microscopy images of tri-PyMT primary tumours (b) and lung metastases (met; c) (>10 sections from 3 mice), depicting RFP+ and GFP+ cells within the tumour bed, and staining (white, pseudo-coloured) for PyMT. Scale bars, 100 μm.

PowerPoint slide

Primary breast tumours developed in the tri-PyMT mice at 8 weeks of age. Immunofluorescence revealed that the majority of tumour cells, identified by PyMT oncogene expression, were RFP positive (Fig. 1b). These cells expressed E-cadherin and lacked vimentin (Extended Data Fig. 1a), indicating their epithelial phenotype. The GFP+ cells detected in the tumour bed were largely haematopoietic cells as they are PyMT negative and express CD45, a pan-haematopoietic marker (Extended Data Fig. 1a), which is consistent with previous reports15. Altogether, this data suggests that tumour cells maintain their original RFP expression and epithelial phenotype in the primary tumour.

Lung metastasis developed spontaneously in tri-PyMT lungs at 12 weeks of age. Surprisingly, the PyMT-positive metastatic lesions were RFP+ (Fig. 1c), and epithelial (E-cadherin+/vimentin) (Extended Data Fig. 1b), whereas only non-tumour cells expressed GFP. These results indicate that tumour cells did not activate the mesenchymal-specific Fsp1 promoter, and retained their epithelial phenotype during metastasis. Thus, tumour cells may not undergo EMT to form metastatic lesions.

Lineage tracing in additional models

To exclude the possibility that the absence of EMT in metastasis may be unique to PyMT-driven breast tumours, we established EMT lineage tracing in the Neu oncogene-driven16 spontaneous breast cancer model (MMTV-neu/Rosa26-RFP-GFP/Fsp1-Cre, tri-Neu mouse). The Neu (ErbB-2) proto-oncogene is associated with 20–30% of human breast cancers, and MMTV–Neu transgenic mice spontaneously develop focal adenocarcinomas resembling human luminal phenotypes after an extended latency at 6–8 months of age. Lung metastases are frequently (72%) observed in these transgenic mice at 9–12 months of age. Mirroring the tri-PyMT model, the Neu+ tumour cells in both primary and metastatic lesions in tri-Neu mice were also RFP+ and epithelial (E-cad+/Vim) (Extended Data Fig. 2). Therefore, the absence of EMT during metastasis formation is an oncogene-independent phenomenon, manifesting in both PyMT and Neu-driven tumours.

To overcome the limitation of using solely Fsp1–Cre to indicate EMT, we acquired the vimentin–CreER transgenic mouse, which successfully traced mesenchymal lineage cells during liver fibrosis17, and generated an additional EMT lineage tracing model (tri-PyMT/Vim mice, MMTV-PyMT/Rosa26-RFP-GFP/Vimentin-creER). After continuous induction of Cre activity by Tamoxifen injection (2 mg, intraperitoneal, three times per week starting when the primary tumours appear at 8 weeks of age) the majority of tumour cells in both the primary and metastatic lesions in tri-PyMT/Vim mice were RFP+ (Extended Data Fig. 3)—suggesting an absence of vimentin promoter activation during lung metastasis formation. EMT marker staining also revealed the epithelial phenotype (E-cad+/Vim) of the tumour cells in both primary and metastatic lesions (Extended Data Fig. 3).

Together, results from two oncogene-driven metastatic tumour models (MMTV–PyMT and MMTV–Neu) and two independent mesenchymal-specific reporters (Vim–Cre and Fsp1–Cre) suggest that EMT does not significantly contribute to the development of lung metastases.

Validating EMT lineage tracing

To evaluate the specificity and sensitivity of the EMT lineage tracing system, we established a cell line from the tri-PyMT breast tumours. In culture, RFP+ tri-PyMT cells switched their fluorescent marker expression to GFP, as indicated by the presence of a RFP+/GFP+ double-positive transitioning population (Fig. 2a). The cells were cultured in 10% FBS, and serum is known to be enriched for many EMT promoting factors including TGFs18. Moreover, addition of TGF-β1 in low-serum conditions (2% FBS), yielded an increase in GFP+ cells (Extended Data Fig. 4a). In concert with the fluorescent marker switch, tri-PyMT cells changed their morphology from cobblestone-like clusters of epithelial cells to dispersed spindle-shaped mesenchymal cells (Fig. 2b). Reflecting the morphologic differences, the GFP+ cells were more motile than RFP+ cells (Extended Data Fig. 4b).

Figure 2: The EMT lineage tracing system reports EMT in tumour cells with high fidelity.
figure 2

a, Scatter plots from flow cytometry analysis of tri-PyMT primary tumour cells, depicting GFP+ and RFP+ populations in the primary tumour immediately after sorting of RFP+ cells (P1), and after ten passages in culture with 10% FBS (P10 + 10% FBS). Numbers indicate the percentage of RFP+ and GFP+ cells in the total population. b, Phase contrast/fluorescent overlay image of tri-PyMT cells in culture. Scale bar, 50 μm. c, Western blot of sorted RFP+ and GFP+ tri-PyMT cells for E-cadherin, vimentin and β-actin as a loading control. Representative of two individual experiments. For original gel images, see Supplementary Fig. 1. d, Representative imaging of GFP+ and RFP+ tumour cells in primary tumours (PT) and lung metastases (LM) in the orthotopic model (n = 8 mice). Arrow indicates scattered GFP+ EMT tumour cells in the primary tumour. Scale bars, 100 μm (PT) and 50 μm (LM). e, qRT–PCR analysis of relative expression of EMT markers in RFP+ and GFP+ cells sorted from orthotopic tri-PyMT primary tumours. Gapdh served as the internal control. E-cadherin is encoded by the Cdh1 gene. Occludin is encoded by the Ocln gene. Data are reported as mean ± s.e.m., n = 4 primary tumours.

PowerPoint slide

The fidelity of the EMT lineage tracing system was confirmed by analysis of EMT marker expression in sorted RFP+ and GFP+ tri-PyMT cells. RFP+ cells expressed elevated levels of epithelial markers including E-cadherin and Occludin, while GFP+ cells expressed several mesenchymal markers including vimentin, FSP1, Twist, Zeb1 and Zeb2 as determined by quantitative reverse transcription PCR (qRT–PCR) (Extended Data Fig. 4c). Both RFP+ and GFP+ tri-PyMT cells expressed the PyMT oncogene. Consistently, western blot analysis confirmed the differential expression of E-cadherin and vimentin in RFP+ and GFP+ cells (Fig. 2c). Flow cytometry for E-cadherin revealed that the majority of E-cadherin cells were GFP+ (97.4%) (Extended Data Fig. 4d). Of note, the E-cadherin+ cells were either RFP+ (93.6%) or RFP+/GFP+ (6.0%), demonstrating that tumour cells switch their fluorescent marker expression before the loss of epithelial markers, and validating the early reporting of EMT in our system. These results confirm that the Fsp1–Cre-mediated fluorescent marker switch in tumour cells reports EMT with high fidelity and efficiency.

Rare EMT events in tumour progression

In the triple-transgenic models, ubiquitous expression of GFP in the tumour microevironment precluded detection of potentially rare GFP+ tumour cells. To confine the fluorescence to tumour cells, we established an orthotopic model by implanting purified RFP+ tri-PyMT cells in wild-type mice (Extended Data Fig. 5a, b). Consistent with observations in the triple-transgenic mice, primary tumours contained RFP+ epithelial cells (Fig. 2d). However, GFP+ cells were detected, indicating tumour EMT (Fig. 2d and Extended Data Fig. 5c, upper panel). These cells lacked E-cadherin (Extended Data Fig. 5c, upper panel) and made up 1.98 ± 1.40% (n = 6) of the total tumour cells (Extended Data Fig. 5d). qRT–PCR analysis of EMT markers comparing sorted RFP+ and GFP+ cells from the same primary tumour confirmed the mesenchymal phenotype of the GFP+ cells (Fig. 2e). Importantly, these GFP+ EMT tumour cells did not contribute to lung metastasis. Early disseminated tumour cells detected in the lungs were epithelial and RFP+ (Extended Data Fig. 5c, middle panel), and 28 lung nodules detected in 8 mice maintained the epithelial phenotype (Fig. 2d and Extended Data Fig. 5c, lower panel).

We also established an orthotopic tri-PyMT/Vim model, wherein Tamoxifen was administered directly after orthotopic injection to ensure immediate tracing of EMT events. Consistently, the majority of tumour cells in both primary and metastatic tumours were RFP+ and epithelial (Extended Data Fig. 6). Again, GFP+ EMT events (4.46 ± 1.0% of total tumour cells, n = 3) were detected in the primary tumours.

To further dissect the metastatic cascade, we quantified the relative numbers of RFP+ and GFP+ cells in the primary tumour, blood and metastases of the tri-PyMT orthotopic model by flow cytometry. An RFP to GFP ratio of ~100:1 in the primary tumour and ~15:1 in the blood was observed (Extended Data Fig. 7a, b). However, gain by the enrichment of GFP+ cells in circulation did not translate to an advantage in metastatic outgrowth, as the RFP:GFP ratio in the lung was ~150:1. Altogether, these findings are consistent with our observations in the triple-transgenic models, suggesting that the majority of breast tumour cells persist in an epithelial state during primary tumour growth and lung metastasis formation.

EMT inhibition and metastasis formation

In spite of the extensive characterization of the EMT reporter system, there was still the distant possibility of our reporter failing to manifest all EMT events in vivo. Therefore, we sought to inhibit EMT and determine its impact on metastasis. We ectopically expressed miR-200, a well-known inhibitor of EMT that directly targets Zeb1 and Zeb2—the transcriptional repressors of E-cadherin19,20. We posited that stably expressing miR-200 in tri-PyMT cells would block EMT and trap tumour cells in a permanent epithelial state. Compared with control cells, miR-200 overexpressing cells (Extended Data Fig. 7c) showed elevated expression of epithelial cell markers and reduced expression of mesenchymal markers (Extended Data Fig. 7d). As expected, overexpression of miR-200 inhibited the RFP to GFP conversion (>90% remaining RFP+, Fig. 3a). These results substantiate effective miR-200 suppression of EMT in the tri-PyMT cells.

Figure 3: mir-200 inhibition of EMT in tri-PyMT cells did not impact lung metastasis.
figure 3

a, Flow cytometry analysis of tri-PyMT control and mir-200-expressing cells, indicating the percentage of RFP+ and GFP+ cells. b, Representative histologic lung images in tri-PyMT control and mir-200-expressing orthotopic mice (n = 5). Scale bar, 1.5 mm. c, Quantification of lung metastasis formation (number of individual nodules) in tri-PyMT control and mir-200-expressing tumour-bearing mice (n = 5). Data reported as the mean ± s.e.m.

PowerPoint slide

To explore the impact of inhibiting EMT on metastasis formation in vivo, we orthotopically injected miR-200 overexpressing tri-PyMT cells. We identified 18 metastases in 5 mice, a similar ratio to that observed in mice bearing control tri-PyMT cells (28 metastases in 8 mice) (Fig. 3b, c). These results demonstrate that inhibition of EMT by miR-200 overexpression does not impair the ability of tumour cells to form distant lung metastases.

EMT is involved in chemoresistance

Emerging evidence suggests a molecular and phenotypic association between EMT and chemoresistance in several cancers21,22,23. Compellingly, residual breast cancers following chemotherapy display a mesenchymal phenotype and tumour-initiating features23. To determine if the acquisition of chemoresistance induces specific molecular changes consistent with EMT, we evaluated the orthotopic tri-PyMT model under chemotherapy. Animals with established primary tumours were treated with cyclophosphamide (CTX), a commonly used drug in breast cancer treatment24 (100 mg kg−1, once per week, for two weeks prior, and two weeks after, surgery; Fig. 4a). The tumours responded to chemotherapy, manifesting a 60% reduction in growth and markedly enhanced apoptotic activity (Extended Data Fig. 8a–c). Of note, the RFP+ cells were highly proliferative and apoptotic in comparison with GFP+ cells in CTX-treated mice (Extended Data Fig. 8d–g), suggesting that GFP+ cells have reduced susceptibility to chemotherapy. However, in the primary tumour, the GFP+ cell percentage remained static under CTX treatment (Extended Data Fig. 8h).

Figure 4: EMT tumour cells are resistant to chemotherapy.
figure 4

a, Schema of CTX treatment in tri-PyMT orthotopic model. Mice bearing an RFP+ primary tumour were treated with CTX (100 mg kg−1, once per week, for 4 weeks, as indicated by blue arrows). After 2 weeks of treatment, primary tumour (PT) was removed (black arrow). Lung metastasis growth was permitted for 4 weeks post CTX treatment. Fluorescent imaging of lungs revealed the contribution of GFP+ tumour cells to lung metastases (n = 9 mice). b, Ratio of GFP+ to RFP+ cells in early metastatic lungs (4 weeks post orthotopic injection) of untreated control and CTX-treated mice as quantified by flow cytometry (n = 4, *P < 0.05). Data reported as the mean ± s.e.m. c, Apoptosis (as measured by Annexin binding) of RFP+ and GFP+ tri-PyMT cells treated with CTX (n = 2 biological replicates). d, Flow cytometry scatter plot showing the proportions of RFP+ and GFP+ tri-PyMT cells before intravenous injection. Mice were treated with CTX (100 mg kg−1 per week for 3 weeks, n = 5 mice per group). e, Quantification of flow cytometry data showing the percentage of RFP+ and GFP+ tumour cells (red and green bars, respectively) of total cells in the lung of control and CTX-treated mice (n = 5 mice per group, *P < 0.05). f, Quantification of flow cytometry data showing the ratio of GFP+ to RFP+ cells in lungs of control and CTX-treated mice. Black line represents the starting ratio of GFP+ to RFP+ cells before injection as derived from the data in Fig. 4d (*P < 0.05). Data reported as the mean ± s.e.m.

PowerPoint slide

Remarkably, in the early metastatic lungs (four weeks after tumour inoculation), flow cytometry analysis revealed a 2.7:1 ratio of GFP+ to RFP+ cells in CTX-treated mice (Fig. 4b). Subsequently at four weeks after cessation of treatment, a notable contribution of GFP+ tumour cells was detected in 5 out of 17 metastatic lesions (Fig. 4a). This is in contrast to untreated mice, where all metastatic lesions were derived from RFP+ cells (Fig. 2d), suggesting that the EMT process may be involved in metastatic outgrowth in the context of chemotherapy.

To evaluate the effects of CTX on the EMT and non-EMT cell populations, sorted GFP+ and RFP+ cells were incubated with CTX in vitro—the GFP+ cells were markedly more resistant to both short- and long-term treatment (Fig. 4c and Extended Data Fig. 9a, b). The selective advantage of mesenchymal tumour cells in the context of chemotherapy was then corroborated by a competitive survival assay in vivo (Fig. 4d). Mice were injected intravenously with an equivalent number of RFP+ and GFP+ cells, and immediately received CTX (100 mg kg−1, once per week). After three weeks, lungs were harvested and the ratio of RFP+ and GFP+ cells was assessed by flow cytometry. CTX significantly inhibited outgrowth of lung metastasis from both RFP+ and GFP+ cells (Fig. 4e). The untreated lungs were morbidly overwhelmed with tumours, with nearly 80% of the tumour cells detected as RFP+. Conversely, in CTX-treated mice, more than 60% of the surviving tumour cells were GFP+, producing a significantly higher ratio of GFP:RFP cells in these mice (Fig. 4f). These results indicate that GFP+ EMT cells are more resistant to chemotherapy both in vitro and in vivo.

Immunostaining revealed that in the untreated mice, both RFP+ and GFP+ cells formed epithelial metastatic lesions (E-cad+/Vim) (Extended Data Fig. 9c). Given the initial mesenchymal phenotypes of GFP+ cells before injection, this suggests that the GFP+ tumour cells have undergone MET in the metastatic organ. On the other hand, in CTX-treated mice the majority of surviving tumour cells were scattered mesenchymal GFP+ cells (E-cad/Vim+) (Extended Data Fig. 9d). Together, these observations suggest that EMT tumour cells that sustain a mesenchymal phenotype are resistant to chemotherapy.

To begin to investigate the molecular underpinnings of mesenchymal tumour cell resistance, we analysed the transcriptomic changes of EMT tumour cells. We sorted RFP+ and GFP+ cells and performed RNA-sequencing analysis (Supplementary Information Table 1). In addition to the expected changes in EMT marker expression (Extended Data Fig. 10a), the expression of many cell-proliferation-related genes was reduced in GFP+ cells (Extended Data Fig. 10b), mirroring their phenotype of reduced proliferation in vivo. The GFP+ cells also showed increased expression of proven chemoresistance-related factors including IL6, Periostin, Enpp2 and Pdgfr25,26,27,28. Additionally, the CTX-treated GFP+ cells elevated their expression of many drug-metabolizing enzymes including drug transporters (Abcb1a, Abcb1b and Abcc1), aldehyde dehydrogenases (ALDHs), cytochrome P450s, and glutathione-metabolism-related enzymes (Extended Data Fig. 10c). The main toxicity of CTX is due to its metabolite phosphoramide mustard, which is only formed in cells with low levels of ALDHs. ALDH converts the CTX-metabolite aldophosphamide into the non-toxic carboxyphosphamide29. In accordance with the transcriptomic data, GFP+ cells had significantly higher ALDH activity compared with RFP+ cells (Extended Data Fig. 10d). These properties of reduced proliferation, increased apoptotic resistance, and upregulation of chemoresistance and drug metabolizing genes in GFP+ EMT tumour cells may contribute to their insensitivity to CTX. Notably, GFP+ cells were also refractory to other commonly used chemotherapies including doxorubicin, paclitaxel, and fluorouracil treatment (Extended Data Fig. 10e).

To demonstrate that the EMT is required for the generation of CTX resistance, we first tested in vitro the effect of treatment on control and miR-200 overexpressing tri-PyMT cells. With increasing concentrations of CTX, the miR-200 cells were significantly more susceptible to therapy (Fig. 5a). We then expanded upon this finding in vivo, establishing orthotopic control and miR-200 primary tumours, and applying the pre- and post-surgery CTX regimen. We found that by blocking EMT in tumour cells, we effectively ablated metastatic growth (Fig. 5b, c). Thus, EMT contributes to the development of chemoresistant metastasis.

Figure 5: miR-200 overexpression abrogates CTX resistance.
figure 5

a, Sensitivity of Control and miR-200-expressing tri-PyMT tumour cells to CTX treatment as measured by CellTiter-Glo. n = 4 biological replicates per condition b, Representative histologic lung images in tri-PyMT control and mir-200-expressing tumour-bearing mice treated with CTX (n = 5). Scale bar, 1.5 mm. c, Quantification of lung metastasis formation (number of individual nodules) in CTX-treated tri-PyMT control and mir-200-expressing tumour-bearing mice (n = 5). Data reported as the mean ± s.e.m.

PowerPoint slide

Discussion

Using two independent EMT lineage tracing strategies in two disparate oncogene-driven autochthonous models of breast cancer, we demonstrated that lung metastases are derived from non-EMT tumour cells, contradicting the original EMT/MET hypothesis2,30. In a tracing system similar to our own, EMT was identified in primary tumours, but the mesenchymal lineage status of the metastatic nodules was not pursued31. Ultimately in our models we found that tumour cells disseminate and form metastases while persisting in their epithelial phenotype, in accordance with a recent study32. To underline that EMT is not required for metastasis, overexpression of miR-200—a microRNA that is incongruously associated with both reduced invasion19,20 and increased metastasis33—resulted in combined suppression of the EMT-promoting transcription factors Snail1/2, Twist, Zeb1 and Zeb2, but had no effect on metastasis. Given that both epithelial and mesenchymal tumour cells have the potential to disseminate, it is plausible that the larger fraction of highly proliferative epithelial cells outcompete the minor EMT tumour cell population in generating macrometastatic lesions.

Until now, the majority of data connecting EMT with chemoresistance was largely derived from in vitro studies, or clinical prognostic data. Here we demonstrate that highly proliferative non-EMT cells are sensitive to chemotherapy, and observe the emergence of recurrent EMT-derived metastases after treatment. There is a great emphasis towards developing EMT-targeting therapies34,35, and our studies suggest that while EMT blockade may not affect metastasis formation, specifically targeting EMT tumour cells will be synergistic with conventional chemotherapy. Thus, our EMT lineage tracing system provides a unique preclinical platform to develop combination therapies that will eliminate both populations, and combat chemoresistance.

Methods

Animals

Wild-type C57BL/6 and FVB/n mice, and transgenic mice with ACTB–tdTomato–eGFP (stock no. 007676), Fsp1–Cre (stock no. 012641), MMTV–PyMT (stock no. 002374), and MMTV–Neu (stock no. 002376) were obtained from The Jackson Laboratory. The vimentin–CreER mouse was a kind gift from the laboratory of R. F. Schwabe at Columbia University. CB-17 SCID mice were obtained from Charles River Laboratories. All mouse strains obtained were bred in the animal facility at Weill Cornell Medical College. All animal work was conducted in accordance with a protocol approved by the Institutional Animal Care and Use Committee at Weill Cornell Medical College.

The ACTB–tdTomato–EGFP and Fsp1–Cre mice were bred together to obtain double transgenic mice and then bred with MMTV–PyMT or MMTV–Neu mice to obtain the tri-PyMT and tri-Neu triple-transgenic mice, respectively. Double transgenic male mice carrying ACTB–tdTomato–eGFP and MMTV–PyMT were crossed with the vimentin–CreER mice to obtain the tri-PyMT/Vim triple-transgenic mice. Genotyping for each transgenic line was performed following the standardized protocols as described in the website of The Jackson Laboratory. Genotyping for vimentin–CreER was done using forward primer 5′-CCCCTTCCTCACTTCTTTCC and reverse primer 5′-ATGTTTAGCTGGCCCAAATG.

Tamoxifen injection

To induce vimentin–CreER activity in the tri-PyMT/Vim mice, Tamoxifen (Sigma-Aldrich, 2 mg per mouse, dissolved in corn oil) was administered through intraperitoneal injections, three times per week starting when the primary tumours appear (at 8 weeks of age) and continuing for 6 weeks until metastasis developed in the lung.

Establishing tri-PyMT cell line

The primary tumour of the tri-PyMT mouse (12-week-old female) was surgically removed under sterile conditions. Tumour tissue was sliced into ~1 mm3 blocks and implanted into the fat pad (no. 4 on the right side) of CB-17 SCID mice. The secondary tumour was used to establish the tri-PyMT cell line, eliminating the contamination of fluorescent positive stromal cells in the tumour tissue from tri-PyMT transgenic mice.

Tumour tissue was minced and digested with an enzyme cocktail (Collagenase A, elastase, and DNase I, Roche Applied Science) in HBSS buffer at 37 °C for 30 min. The cell suspension was strained through a 40-μm cell strainer (BD Biosciences). Cells were washed with PBS three times and uploaded in the Aria III cell sorter (BD Biosciences). The sorted RFP+ cells were cultured in DMEM supplemented with 10% fetal bovine serum. The PyMT oncogene expression in the established cell line was confirmed by RT–PCR (Extended Data Fig. 4c). The tumorigenic ability of these cells was confirmed throughout the study.

To determine EMT induction by TGF-β, cells were cultured for one week in DMEM with 2% FBS and 2 ng ml−1 TGF-β1 (R&D Systems). The GFP+ cell ratio was quantified by flow cytometry.

To generate the miR-200 overexpressing cell line, a pLenti 4.1 Ex miR-200b-200a-429 construct20, was obtained from Addgene. To eliminate the contamination of fluorescent marker expression in targeted cells, the GFP gene in this construct was removed by BstBI/XbaI digestion followed by blunted self-ligation. Lentivirus was packaged by co-transfection of the pLenti-miR-200 construct and packaging plasmids into HEK293T cells. tri-PyMT cells (passage 2) were infected with the lentivirus. Infected cells (tri-PyMT miR-200) were selected by culturing with puromycin (2 μg ml−1) for 14 days. A control tri-PyMT cell line was generated by infecting cells with lentivirus carrying the puromycin resistance gene, following the same procedure in parallel.

Orthotopic breast tumour model

To establish an orthotopic breast tumour model, we first purified RFP+ cells from passages 10–15 of tri-PyMT cell culture by FACS. The purified RFP+ tri-PyMT cells (1 × 106 cells with purity >99%, Extended Data Fig. 5a) were injected into the mammary fat pad of 8-week-old female CB-17 SCID mice. The growth of the primary tumour was monitored by external calliper measurement once a week. In approximately 4 weeks, the primary tumour was surgically removed and the incision was closed with wound clips. The tumour size did not exceed 5% of total body weight as permitted in the IACUC protocol. Animals were euthanized 4 weeks after primary tumour removal to analyse the development of pulmonary metastasis. For animals subjected to chemotherapy, Cyclophosphamide (CTX, Sigma-Aldrich, 100 mg kg−1) was administered once per week, for 2 weeks prior and 2 weeks after surgery.

Tissue processing, immunofluorescence and microscopy

The harvested primary tumours and PBS-perfused lungs bearing metastases were fixed in 4% paraformaldehyde overnight, followed by 30% sucrose for 2 days, and then embedded in Tissue-tek O.C.T. embedding compound (Electron Microscopy Sciences). Serial sections (10 μm, at least 10 sections) were prepared for histological analysis by haematoxylin and eosin staining, and immunofluorescent staining following standardized protocols.

Primary antibodies used in this study include CD45 (30-F11, BioLegend), E-cadherin (DECMA-1, BioLegend), vimentin (sc-7557, Santa Cruz), PyMT (ab15085, Abcam), Neu (sc-284, Santa Cruz), Ki67 (ab15580, Abcam), and active caspase-3 (C92-605, BD Pharmingen). Primary antibodies were directly conjugated to Alexa Fluor 647 using an antibody labelling kit (Invitrogen) performed as per manufacturer’s instructions and purified over BioSpin P30 columns (Bio-Rad). GFP+ and RFP+ cells were detected by inherent fluorescence.

Fluorescent images were obtained using a computerized Zeiss fluorescent microscope (Axiovert 200M), fitted with an apotome and an HRM camera. Images were analysed using Axiovision 4.6 software (Carl Zeiss).

Flow cytometry and cell sorting

For the metastatic lungs and primary tumours, cell suspensions were prepared by digesting tissues with an enzyme cocktail (collagenase A, elastase, and DNase I, Roche Applied Science) in HBSS buffer at 37 °C for 30 min. For cultured cells, cells were collected through trypsinization. A single-cell suspension was prepared by filtering through a 30-μm cell strainer (BD Biosciences). Then cells were stained following a standard immunostaining protocol. In brief, cells were pre-blocked with 2% FBS plus Fc block (CD16/CD32, 1:30, BD Biosciences) and then incubated with the primary antibody against E-cadherin (DECMA-1, BioLegend). SYTOX Blue (Invitrogen) was added to the staining tube in the last 5 min to facilitate the elimination of dead cells. GFP+ and RFP+ cells were detected by their intrinsic signals. The stained samples were analysed using the LSRII flow cytometer coupled with FACS Diva software (BD Biosciences). Flow cytometry analysis was performed using a variety of controls including isotype antibodies, unstained and single-colour stained samples for determining appropriate gates, voltages and compensations required in multivariate flow cytometry.

For sorting live cells back for further culturing or injection into animals, we used the Aria II cell sorter coupled with FACS Diva software (BD Biosciences). The preparation of cells for sorting was performed under sterile conditions. The purity of subpopulations after sorting was confirmed by analysing post-sort samples in the sorter again.

Quantitative RT–PCR analysis

Total RNA was extracted by using the RNeasy Kit (Qiagen), and miRNA via the mirVana miRNA isolation kit (Life Technologies), and converted to cDNA using qScript cDNA SuperMix (Quanta Biosciences) and RT–PCR. qPCR was performed with the appropriate primers (sequences shown in the table) and iQTM SYBR Green master mix (Bio-Rad). PCR protocol: initial denaturing at 95 °C for 3 min, 40 cycles of 95 °C for 20 s, 60 °C for 30 s, and 72 °C for 30 s, followed by final extension at 72 °C for 5 min and melt curve analysis was applied on a Bio-Rad CFX96 Real Time System (Bio-Rad) coupled with Bio-Rad-CFX Manager software. Primers used are as follows: GAPDH, forward, 5′-GGTCCTCAGTGTAGCCCAAG-3′; reverse 5′-AATGTGTCCGTCGTGGATCT-3′; Cdh1 (E-cadherin), forward, 5′-ACACCGATGGTGAGGGTACACAGG-3′; reverse, 5′-GCCGCCACACACAGCATAGTCTC-3′; Ocln, forward, 5′-TGCTAAGGCAGTTTTGGCTAAGTCT-3′, reverse, 5′-AAAAACAGTGGTGGGGAACGTG-3′; Vim, forward, 5′-TGACCTCTCTGAGGCTGCCAACC-3′; reverse, 5′-TTCCATCTCACGCATCTGGCGCTC-3′; Cdh2 (N-cadherin), forward, 5′-AAAGAGCGCCAAGCCAAGCAGC-3′; reverse, 5′-TGCGGATCGGACTGGGTACTGTG-3′; FSP-1, forward, 5′-CCTGTCCTGCATTGCCATGAT-3′, reverse, 5′-CCCACTGGCAAACTACACCC-3′; Snai1, forward, 5′-ACTGGTGAGAAGCCATTCTCCT-3′; reverse, 5′-CTGGCACTGGTATCTCTTCACA-3′; Snai2, forward, 5′-TTGCAGACAGATCAAACCTGAG-3′; reverse, 5′-TGTTTATGCAGAAGCGACATTC-3′; Twist1, forward, 5′-AGCTACGCCTTCTCCGTCTG-3′; reverse, 5′-CTCCTTCTCTGGAAACAATGACA-3′; Zeb-1, forward, 5′-GATTCCCCAAGTGGCATATACA-3′; reverse, 5′-TGGAGACTCCTTCTGAGCTAGTG-3′; Zeb-2, forward, 5′-TGGATCAGATGAGCTTCCTACC-3′; reverse, 5′-AGCAAGTCTCCCTGAAATCCTT-3′; PyMT, forward, 5′-ACTGCTACTGCACCCAGACA-3′; reverse, 5′-CTGGAAGCCGGTTCCTCCTA-3′; GFP, forward, 5′-CCACATGAAGCAGCACGACT-3′; reverse, 5′-GGGTCTTGTAGTTGCCGTCG-3′; RFP, forward, 5′-AGCGCGTGATGAACTTCGAG-3′; reverse, 5′-CCGCGCATCTTCACCTTGTA-3′.

RNA-sequencing analysis

Total RNA was extracted from sorted RFP+ and GFP+ tri-PyMT cells with the RNeasy Kit (Qiagen). RNA-seq libraries was constructed and sequenced following standard protocols (Illumina). Single-end RNA-seq reads were mapped to UCSC mouse genome (GRCm38/mm10) using Tophat2. FPKM values for each gene were estimated by Cufflinks and statistical analysis was done using Cuffdiff2. Heat maps for differentially expressed genes with adjusted P values <0.05 were drawn using gplots R package.

Western blot analysis

Cells were homogenized in 1× RIPA lysis buffer (Millipore) with protease inhibitors (Roche Applied Science). Samples were boiled in 1× Laemmli buffer and 10% β-mercaptoethanol, and loaded onto 12% gradient Tris-glycine gels (Bio-Rad). Western blotting was performed using antibodies specific for E-cadherin (clone DECMA-1), vimentin (clone RV202, BD Pharmingen), and β-actin (clone AC-15, Sigma-Aldrich).

Cell apoptosis and viability assays

To determine apoptosis of RFP+ and GFP+ cells, tri-PyMT cells (Passage 10) were seeded on adherent six-well plates (1 × 106 cells), and treated with 4-hydroperoxy cyclophosphamide (Santa Cruz) for 48 h. After treatment, cells were trypsinized and stained with APC-conjugated Annexin V (BD Biosciences) and SYTOX Blue (Invitrogen) for apoptotic-cell labelling. The stained cells were analysed in the LSRII flow cytometer to quantify the percentage of apoptotic, dead, and live RFP+ and GFP+ cells by FACS Diva software. To determine the viability of tri-PyMT control and miR-200-expressing cells treated with CTX, cells were plated in 96-well adherent black-walled plates (1 × 104 cells), and treated with 4-hydroperoxy cyclophosphamide for 48 h. After treatment, cell viability was measured with the CellTiter-Glo Luminescent Cell Viability Assay (Promega).

Cell migration assay

1 × 105 tri-PyMT cells were seeded in a six-well plate. Real-time images of cells (including phase, GFP and RFP channels) were taken under a computerized Zeiss microscope (Axiovert observation) every 10 min for 10 h. Movement of individual cells (>10 RFP+ and >10 GFP+ cells in each field, >2 fields were analysed) were tracked with ImageJ software, and the distance that was travelled during that time was measured as indicated.

ALDH activity assay

RFP+ and GFP+ tri-PyMT cells (1 × 106 cells each) were freshly sorted from culture by FACS and then homogenized in cold ALDH Assay buffer provided in the ALDH Activity Colorimetric Assay Kit (Biovision Inc.) Following the protocol, ALDH substrate and acetaldehyde were added. ALDH activities in samples were measured by OD at 450 nm in kinetic mode (every 3 min for 60 min).

Statistical analysis

To determine the sample size of animal experiments, we used power analysis assuming . Therefore, all animal experiments were conducted with ≥5 mice per group to ensure adequate power between groups by two-sample t-test comparison. Animals were randomized within each experimental group. No blinding was applied in performing experiments. Results are expressed as mean ± s.e.m. Data distribution in groups and significance between different treatment groups was analysed by using the Mann–Whitney U-test in GraphPad Prism software. P values <0.05 were considered significant. Error bars depict s.e.m., except where indicated otherwise.