Main

The goal of our experiment is twofold: first, to link cells in the embryo to their corresponding clones in adult tissue (Fig. 1a); second, to quantify cell-type composition of these clones to determine the multipotency of embryonic progenitors. To reach the first goal, we need to uniquely label the cells in an embryo with permanent and heritable labels. For this, we use CRISPR–Cas9 technology, which induces a double-stranded break at the targeted genomic site that is repaired as insertions or deletions of different lengths at different positions (scars)16,17. To allow for multiple scarring in the same cell, we use a zebrafish line with eight in-tandem copies of a histone–green fluorescent protein (GFP) transgene17 (Methods). Scarring starts after injecting the yolk or cell of the zygote with Cas9 RNA or protein, and a single-guide RNA (sgRNA) that targets GFP (Fig. 1b).

Figure 1: Single-cell clonal tracing in zebrafish.
figure 1

a, Embryonic cells get permanent and unique labels that are transmitted to the clones in the adult. b, Zygote injection with Cas9 RNA or protein and sgRNA that targets GFP (GFP-sgRNA). H2A, histone 2A. c, Mean fraction of unscarred GFP as a function of time, computed over ten independently injected embryos (t > 6 h), and over three pools of ten embryos (t ≤ 6 h), in which GFP was PCR-amplified from gDNA. Error bars denote s.e.m. Unscarred GFP exponentially decreases at 0.064 ± 0.002 h−1 (0.294 ± 0.008 h−1) and is constant after 10 ± 1.0 h (3.1 ± 0.1 h) for RNA (protein) injections (solid lines, Supplementary Information section 1). d, ScarTrace protocol (Methods). IVT, in vitro transcription; RT, reverse transcription. cBC, cell-specific barcode for transcriptome; sBC, cell-specific barcode for scars.

PowerPoint slide

We quantified the scarring rate by measuring the fraction of unscarred GFP in zebrafish embryos at different times after Cas9 delivery (Fig. 1c), which is five times faster in Cas9 protein than in RNA injections. Cas9 activity ceases at around 3 h for protein and at 10 h for RNA injections, when zebrafish embryos have about 1,000 and 8,000 cells, respectively2. We detect more than 1,000 distinct scars, the abundances and probabilities of which span several orders of magnitude13 (Supplementary Information sections 1 and 2).

To detect scars and transcriptome from single cells, we developed ScarTrace, which integrates a nested PCR step after transcriptome conversion to cDNA into the sorting and robot-assisted transcriptome sequencing (SORT-seq) protocol18 (Fig. 1d). Because the histone–GFP transgene is transcribed, scars can be detected from mRNA and genomic DNA (gDNA). Detection from gDNA is preferred because GFP expression might be tissue specific, vulnerable to silencing and scars might affect the half-life of the mRNA. We assessed the efficiency of scar detection from mRNA and gDNA by comparing scar patterns of single cells from the caudal fin obtained using ScarTrace with and without reverse transcription (Fig. 1d, step 1). We detected 3.3 ± 0.3 (mean ± s.e.m.) scars per clone on average, and approximately 25% of the cells remained unscarred and therefore do not contain clonal information (Extended Data Fig. 1a). Clone sizes from gDNA and gDNA–mRNA detection are very similar (Extended Data Fig. 1a–d), indicating that ScarTrace reliably detects scars from gDNA in single cells.

We next used ScarTrace to explore the clonal composition of haematopoietic cells isolated from the whole kidney marrow (WKM) of two protein-injected (P1 and P2) and two RNA-injected (R1 and R2) zebrafish. We found one and two major clones in P1 and P2, and eight and six in R1 and R2, respectively (Fig. 2a, b, Extended Data Fig. 2a, b, Extended Data Table 1). This is a direct result of the time window of Cas9 activity (Fig. 1c). The number of observed clones agrees with previous findings using GESTALT8, in which a similar Cas9-mediated approach is used to label embryonic clones in zebrafish, and with the number of clones (between 10.4 and 15.4) found for haematopoietic stem and progenitor cells at 10–14 hours post fertilization (hpf) using Zebrabow19.

Figure 2: Few clones produce haematopoietic cells.
figure 2

a, b, Scar percentage per cell for fish P1 (a) and R1 (b) (for key, see Extended Data Fig. 2a). The bar above each panel indicates clones and corresponding P values. uGFP, unscarred GFP. c, d, Lineage trees for clones in P1 (c) and R1 (d). The root is an unscarred clone. Clones with corresponding cell fraction and scar copy number (Supplementary Information section 3) are at the tips. The statistical confidence of each branch is computed as its proportion among 10,000 bootstrapped tree replicates. Only clones with more than 2 cells are taken into account. e, t-distributed stochastic neighbour embedding (t-SNE) map of cells from fish R1 and R2 obtained with transcriptome data. Colours indicate the cell type (Extended Data Fig. 2). HSPCs, haematopoietic stem and progenitor cells. f, t-SNE map of cells in fish R1. Colours indicate the clone. g, Clonal cell fraction per cell type for fish R1.

PowerPoint slide

The average number of scars per clone equals 3.3 ± 0.3 for P1, 1.02 ± 0.01 for P2, 3.5 ± 0.3 for R1 and 3.0 ± 0.3 for R2, with a minimum of 1 scar and a maximum of 5 scars per clone, revealing that both Cas9 protein and RNA efficiently cause scarring. We determined the copy number for each scar in a clone by modelling the amplification and sequencing noise of ScarTrace as a branching process (Supplementary Information section 3). Typically, the resulting number of scars per clone is smaller than eight, as a consequence of two or more simultaneously Cas9-induced cuts in the same multi-copy tandem histone-GFP gene10. We computed the P value of a combination of scars to occur in a cell (Fig. 2a, b). Values obtained are commonly below 10−6, emphasizing that although identical scars might be independently introduced in different clones (for example, the yellow scar is present in one clone from fish R1 and four clones from fish R2), the chance of introducing the same combination of scars in independent clones is very small. Consistently, we do not find overlapping clones between different zebrafish. Using cell-to-cell variation in scar composition, we estimate a 90% scar detection efficiency (including unscarred GFP; Extended Data Fig. 1e, f). In addition, by assuming maximum parsimony for sequential scarring events, we build lineage trees for clones (Fig. 2c, d, Extended Data Fig. 2c, d, Supplementary Information section 4).

Using RaceID20 (Methods), we identify eight haematopoietic cell types in fish R1 and R2 (Fig. 2e). Gene expression profiles in the different cell types found for both fish are identical with the exception of erythrocytes, which show slight differences in the expression of characteristic markers (Extended Data Fig. 2e–h). After combining cell type and clonal information for single cells, we observe all clones in all cell types with similar proportions (Fig. 2f, g, Extended Data Fig. 2i, j), indicating that all clones contribute to the production of all blood cells. This is consistent with haematopoietic stem and progenitor cells specification (around 28 hpf), when scarring is already completed21.

Next, we used ScarTrace in the adult brain and eyes of two RNA-injected fish (R2 and R3), in which we identified different neuronal, glia and immune cells (Fig. 3a, Extended Data Fig. 3). To determine clonal enrichment or depletion in certain cell types quantitatively, we used Fisher’s exact test (Fig. 3b, Extended Data Fig. 4a). Here, several clones only generate neurons or retinal interneurons (Extended Data Fig. 4b, c). We observed that microglia share clones with the WKM, confirming that they originate from the WKM22.

Figure 3: Clonality in the brain and eyes.
figure 3

a, t-SNE map of cells in fish R2. Colours indicate cell type (Extended Data Fig. 3). COPCs, committed oligodendrocyte progenitor cells; MFOLs, myelin-forming oligodendrocytes; OPCs, oligodendrocytes progenitors; RGC, retinal ganglion cell. b, Heat map of clonal cell fraction for cell types in fish R2 (COPC, OPC and MFOL clones merged as oligodendrocytes), and two-sided Fisher’s exact test for enriched and depleted clones per cell type with P < 0.05. The bars at the top depict organ, total number of cells, and P value for each clone. Bip., bipolar; RGC, retinal ganglion cell. c, Relative clone frequency in the left (L) and right (R) eye and midbrain for fish R2 and P1, and left and right eye and erythrocytes and immune cells for fish S1.

PowerPoint slide

Upon the exclusion of WKM clones, we found that clones are not only cell-type specific, but also brain-region and eye specific (Extended Data Figs 4d–g, 5a–d). Although R2 and R3 left and right midbrains share a small fraction of clones, left and right eyes share none (Fig. 3c, Extended Data Fig. 4h). However, for fish P1, both midbrains share almost all clones whereas eyes share only one. To explore when this segregation is established, we injected one cell at the two-cell stage with Cas9–eScarlet fusion protein and sgRNA. We found Cas9–eScarlet protein present in only half of the embryo at dome stage (Extended Data Fig. 4i–l), approximately 3 h after Cas9 protein stops scarring. Therefore, scars only occur in one side of the embryo. However, ScarTrace on the left and right eyes of a 3-week-old-injected embryo (S1) reveals scars in both eyes. Upon removal of the clones found in immune cells and erythrocytes, the rest of the clones are specific to each eye (Fig. 3c). This indicates that both eyes get cellular contributions from both sides of the dome-stage embryo. To determine further when lateral commitment arises in eye progenitors, we built lineage trees for clones detected in the left and right eyes or midbrain for fish P1 and R2 (Extended Data Fig. 5e–h). In P1, no significant co-evolution is found among clones from the right (left) eye. By contrast, in R2 we observe a significant depletion of right eye clones evolving with left eye clones. This suggests that progenitors commit to the left or right eye shortly before the end of scarring with Cas9 protein. No significant co-evolution enhancement or depletion is found for clones detected in the left and right midbrain, indicating that cell mixing is important at 10 hpf. This is consistent with the processes of neurulation and neurogenesis23.

Next, we focused on zebrafish caudal fin ontogeny and regeneration. We performed ScarTrace on the primary, secondary and tertiary fins of fish R4, R5 and R6 (Fig. 4a). We identified four major cell types (osteoblasts, mesenchymal, epidermal and immune cells) and observed cell-type-restricted clones in all fish (Fig. 4b, c, Extended Data Figs 6, 7a–e). We found that mesenchymal and epidermal cells share clones, revealing a common developmental origin that is maintained during regeneration. Together with previous imaging-based studies24, this suggests that epidermal ancestors undergo epithelial-to-mesenchymal transition during gastrulation to generate mesenchymal cells in the caudal fin. Osteoblasts did not share clones with any other cell type in the primary fin and showed dorsal–ventral segregation, confirming their early lineage commitment during development13,25,26,27,28. We found lineage restriction of the different cell types as the main mechanism of fin regeneration, consistent with previous results25,28. However, after regeneration, we observed osteoblast-committed clones that generate a fraction (approximately 21% in R4, 44% in R6) of mesenchymal cells (Fig. 4d, Extended Data Fig. 7f). This suggests a certain degree of plasticity after injury, in which progenitors that produce osteoblasts during development can also give rise to mesenchymal cells during fin regeneration29.

Figure 4: Clonality during caudal fin regeneration.
figure 4

a, Primary (first week), secondary (fully regenerated after 3 weeks) and tertiary (fully regenerated after 6 weeks) fins are amputated. Tertiary fins are split dorsally and ventrally. b, t-SNE map of cells in fish R4, R5 and R6. Colours indicate cell type (Extended Data Fig. 6). c, Heat map of clonal cell fraction per cell type and fin in fish R4, and two-sided Fisher’s exact test for clones that are enriched and depleted with P < 0.05. The bars at the top show clones in the WKM, number of cells and P value per clone. d, t-SNE map of cells from fish R4, with cells from the primary (top) or regenerated (bottom) fin detected in osteoblast clones or remaining cell types. Percentages are mesenchymal cell fractions that share clones with osteoblasts. e, Magnified view of t-SNE map indicating immune cells with fin-specific clones, immune cells that share clones with the WKM and cells with no clone information (grey). Subpopulations of lymphoid cells and myeloid cells (numbered 1–4), and the percentage of RICs are indicated. f, Differential gene expression between the four subpopulations of myeloid cells (Supplementary Table 6). g, Cell type fraction for fin-specific clones detected in RICs in the primary fin for R4.

PowerPoint slide

Finally, we investigated the clonal overlap of single cells from the WKM of fish R4, R5 and R6 with immune cells found in the fin. Clones detected in the WKM are enriched in the fin immune cells and depleted in the remaining cell types (Fig. 4c, e, Extended Data Fig. 7g). We found sub-populations of lymphoid and myeloid cells in all fins with different proportions of fin-specific clones, which we identify as resident immune cells (RICs). Differential gene expression analysis in myeloid cells revealed that subpopulation 4 expressed macrophage markers together with the epithelial marker epcam (Fig. 4f), which has been reported in resident macrophages in mice30. All RICs in the primary fin share clonality with epidermal and mesenchymal cells (Fig. 4g, Extended Data Fig. 7b, c). Therefore, our data indicate that RICs have a distinct origin from haematopoietic stem cells (Extended Data Fig. 7h), and arise either from epidermal and mesenchymal transdifferentiation, or from ectodermal ancestors similarly to mesenchymal cells.

We developed ScarTrace as a new method to quantify clonal origin and cell type simultaneously at single-cell resolution. This enabled us to investigate the embryonic origin of clones found in different organs of the adult zebrafish and their cell-type commitment during development and regeneration. CRISPR–Cas9 genome editing technology for lineage tracing purposes at the single cell level has recently also been used in zebrafish to investigate lineages and cell types in the vertebrate brain, and to unravel developmental lineages31,32. We anticipate many applications of ScarTrace in developmental and stem-cell biology, and similar approaches to study clonal selection in cancer models. Because ScarTrace provides a glimpse of the cellular past, it will be interesting to explore how this history is predictive of the current epigenetic and expression state.

Methods

Zebrafish Cas9 and sgRNA injections

Heterozygous zygotes of the transgenic zebrafish line Tg(h2afva:GFP)kca66:(h2afva:GFP)kca66 (ref. 17) were injected at the cell with 1 nl Cas9 protein (NEB; final concentration 1,590 ng μl−1) or at the yolk with 1 nl Cas9 RNA (300 ng μl−1) in combination with an sgRNA that targets GFP (25 ng μl−1, sequence: GGTGTTCTGCTGGTAGTGGT) (Fig. 1b). Cas9 RNA was in vitro transcribed from a linearized pCS2-nCas9n vector (Addgene plasmid 47929)16 using the mMESSAGE mMACHINE SP6 Transcription Kit (Thermo Scientific). The sgRNA was in vitro transcribed from a template using the MEGAscript. T7 Transcription Kit (Thermo Scientific). The sgRNA template was synthesized with T4 DNA polymerase (New England Biolabs) by partially annealing two single-stranded DNA oligonucleotides containing the T7 promotor and the GFP-binding sequence, and the tracrRNA sequence, respectively56. Male and female zebrafish were used, no randomization was done, no blinding was done and no animals were excluded from the analysis. No statistical methods were used to predetermine sample size. The age of the fish used in isolated organ spans 3–18 months. For sample sizes, see Extended Data Table 2. All animal experiments were performed in accordance with institutional and governmental regulations, and were approved by the Dier Experimenten Commissie of the Royal Netherlands Academy of Arts and Sciences and performed according to the guidelines.

Transgene copy number

To determine the number of integrations of the transgene, we performed whole-genome sequencing (NEBNext Ultra library preparation kit for Illumina (E7370S) and the NEB Multiplex Oligos for Illumina (E7500L)) on an homozygous Tg(h2afva:GFP)kca66:(h2afva:GFP)kca66 fish. Paired-end data were trimmed (TrimGalore-0.4.3) and mapped (bwa-0.7.10 mem) to the zebrafish reference genome (danRer10 from UCSC Genome Browser), and PCR and optical duplicates were removed (Picard-2.0.1) (Extended Data Fig. 8a, b). The copy number was extracted using FREEC-11.057 with default parameters. With a 1-kb window size, we find 19 copies of the transgene fragment, whereas with a 500-bp window size, we find 18 (Extended Data Fig. 8b). After correcting for reads due to endogenous copies, we estimate the number of copies of the transgene in a heterozygous fish to be 8 ± 1. This number agrees with single-cell data, because although we detected a maximum of 7 scars per clone (Extended Data Fig. 8c, d), we see that sometimes 6 of the scars in those clones represent approximately 12.5% of the scar content per cell, and one represents approximately 25% (Extended Data Fig. 8e). This again suggests that the number of integrations of the histone-GFP transgene is 1/0.125 = 8.

ScarTrace protocol

Live single cells (based on DAPI exclusion and scatter properties) were sorted into 384-well plates (Sigma-Aldrich) containing 5 μl of mineral oil (Sigma-Aldrich), 50 nl of uniquely barcoded reverse transcription primers (Supplementary Table 1), dNTPs (Promega), Spike-in controls (Thermo Fisher) and RNase inhibitor (SUPERaseIn, Thermo Fisher). Plates were immediately spun down and stored at −80 °C. Cells were lysed at 65 °C for 5 min. Reverse transcription and second-strand synthesis mixes were dispensed into each well using the Nanodrop II and reactions were performed at 42 and 16 °C degrees, respectively (Fig. 1d, step 1). Genomic DNA was access by proteinase K treatment followed by a nested PCR strategy to amplify the scarred GFP region (Fig. 1d, step 2). In the second PCR, unique scar barcodes were introduced in each well (Supplementary Table 2). All cells were pooled and the aqueous phase was separated from the oil phase (Fig. 1d, step 3). The collected material was split for scar library and transcriptome library preparation (Fig. 1d, step 4). For transcriptome library preparation, the SORT-seq protocol18 was used (Fig. 1d, step 5a). For scar library preparation, a PCR introducing only Illumina TruSeq adapters was perfomed (Fig. 1d, step 5b). All libraries were sequenced paired-end at 75 bp read-length on the Illumina NextSeq platform. A detailed description of the protocol is available in Protocol Exchange33.

WKM isolation

The WKM was isolated as previously described34. A ventral midline incision was made to open the adult zebrafish body cavity. All internal organs were carefully removed to access the kidney. The WKM was collected in PBS supplemented with FCS. The tissue was aspirated through a 1 ml pipet tip several times to mechanically dissociate haematopoietic cells. After two consecutive filtering steps (using 70-μm and 40-μm cell strainers (VWR), cells were centrifuged and washed. The pellet of haematopoietic cells was resuspended in PBS and FCS supplemented with DAPI (Thermo Fisher) to assess cell viability

Brain parts and eye isolation

Brain and eyes were isolated from the zebrafish head and dissected in PBS. Optic nerves were removed. The forebrain (olfactory bulb and telencephalon) was isolated from the midbrain, followed by dissection of the hindbrain (rhombencephalon). The midbrain (mesencephalon) was dissected into left and right midbrain. The eyes lens was carefully removed. Brain parts and eyes were dissociated into single cells using a papain-based solution (Thermo Fisher, 88285) and washing solutions as previously described35. The washed cell pellet was resuspended in DMEM/F12 medium (Thermo Fisher, 11320033) and supplemented with DAPI (Thermo Fisher) to assess cell viability for FACS.

Fin amputation

Caudal fin amputations were performed as previously described36, after which fish were returned to 28 °C aquarium water. Once isolated, this tissue was immediately dissociated by moderately shaking at 30 °C for 1 h, with gentle trituration performed every 10 min with a p200 pipet, in a solution of 2 mg ml−1 collagenase A (Sigma-Aldrich) and 0.3 mg ml−1 protease (type XIV, Sigma-Aldrich) in Hanks solution. After 1 h, the solution was incubated for 5 min in 0.05% trypsin in PBS. The solution was strained using 70-μm and 40-μm cell strainers (Corning) and cells were washed in 2% FBS in Hanks solution. Before flow cytometry, cells were centrifuged and resuspended in PBS and FBS supplemented with DAPI (Thermo Fisher) to assess cell viability.

Transcriptome analysis

In transcriptome libraries, the first read contains cell barcode (Supplementary Table 1) and unique molecular identifier (UMI) information, and the second read contains biological information. Second reads with a valid cell barcode extracted from corresponding first reads are mapped using bwa mem-0.7.10 with default parameters to the reference zebrafish transcriptome (Danio rerio assembly Zv9, ensemble 74, extended with ERCC92). For each cell, the number of transcripts per gene was obtained as previously described37. We refer to transcripts as unique molecules based on UMI correction. We ran RaceID3 with different parameters for each organ under study (Supplementary Data 1) for cell filtering, normalization, gene filtering, cell clustering and differential gene expression analysis (in which P values are calculated using negative binomial distribution and corrected for multiple testing by the Benjamini–Hochberg method). The choice of filtering parameters was made to include the maximum number of cells in our analysis without losing cell type information. Supplementary Tables 3–6 provide results for the differentially expressed genes for each cell type compared with all other cells in the organ: WKM38,39,40,41 (90 dendritic cells, 76 eosinophils, 641 erytrhocytes, 516 haematopoietic stem and progenitor cells, 446 lymphocytes, 409 monocytes, 927 neutrophils and 76 thrombocytes), brain and eyes42,43,44,45,46,47,48,49,50,51 (250 bipolar and horizontal cells, 45 COPCs, 9 cones, 290 erythrocytes, 254 immune cells, 88 glia-like cells, 89 MFOLs, 66 microglia, 1,427 neurons, 10 OPCs, 31 RCL, 53 radial glia and 202 rods), caudal fin52,53,54 (144 epidermal cells, 2,834 fibroblasts, 1,784 immune cells, and 2,951 osteoblasts), and resident myeloid cell types in the fin (118 cells in subpopulation 1, 45 in subpopulation 2, 27 in subpopulation 3 and 133 in subpopulation 4).

Scar analysis

In scar libraries, the first read contains the cell barcode (Supplementary Table 2) and the forward primer used in the nested PCR and second read contains the sequence for the scar and the reversed primer. Scripts to extract scars and detect clones are provided as Supplementary Data 2, together with a reference manual (Supplementary Data 3). Bug fixes and updates of the scripts can be downloaded from https://github.com/anna-alemany/scScarTrace. Cells sharing an identical scar pattern are assumed to come from the same clone, independently of scar percentage. Cells with a detected scar pattern that can be assigned to another single clone by assuming that some scar was not sampled were pooled with that clone. Cells that according to their scar pattern can be ambiguously assigned to two or more other clones were removed from subsequent analysis. Clones with less than three cells were also removed.

Code availability

Transcriptome analysis was performed using RaceID3 available at https://github.com/dgrun/RaceID3_StemID2, with parameters summarized in Supplementary Data 1. Scripts for scar extraction and clone detection are provided in Supplementary Data 2, together with a reference manual (Supplementary Data 3). Bug fixes and updates of the scripts can be downloaded from https://github.com/anna-alemany/scScarTrace.

Data availability

The accession numbers for the RNA sequencing datasets reported in this paper have been deposited with the Gene Expression Omnibus (GEO) under accession GSE102990.