Keywords

1 Background

The advent of next-generation sequencing (NGS) platforms made it possible to sequence DNA more efficiently and economically than Sanger sequencing. In addition, the application of NGS in cancer genomics allowed for a deeper understanding of the underlying genetics and pathogenesis of cancer. The human body is composed of trillions of cells that belong to approximately 200 different cell types; however, individual cells from defined cell types are diverse with unique expression profiles [1]. Many cell types/subtypes have few reliable markers that can be used for purification which is in part due to the fact that even cell types with well-established markers contain diversity [1]. Standard techniques for cancer analysis involve averaging signals from mixed populations of cells, which may mask or hide rare/small tumor clones and subclones that contribute to cell diversity [2]. However, with the use of single-cell genomics, the underlying genetics, expression levels, and epigenetics of every gene in the genome can now be analyzed across thousands of individual cells.

The first report of RNA transcriptome sequencing of single-cell mammalian cells occurred in 2009, and the first report of single human cancer cell DNA genome sequencing occurred in 2011 [2, 3]. Since that time, studies on single-cell exome and whole genome sequencing in varying types of cancers including renal, myeloproliferative, colon, lung, glioblastoma, breast, and prostate have been conducted [411]. The original single-cell sequencing method combined flow sorting, whole genome amplification, and NGS to generate genomewide datasets from single cancer cells; however, it had only ~10% physical coverage of a single cell’s genome, sufficient for measuring large-scale copy number changes, but insufficient for resolving mutations at base-pair resolution [2]. Since that time, several other methods have been developed that can achieve high coverage (>90%) from single mammalian cells.

The data now available by single-cell sequencing has revolutionized cancer cell biology. The underlying mechanisms of how tumor and clonal diversity contribute to cancer biological processes remain largely unknown. Intratumor heterogeneity, clonal evolution, underlying mechanisms of tissue invasion, metastasis, and response to cancer-related therapies can potentially be elucidated by investigating molecular signatures at the single-cell level (Fig. 9.1). Tumor diversity is impacted by selection pressures, which can impact the underlying genetics of a cancer cell population. Examples of selection pressures include effects of the immune system, hypoxia, nutrient deprivation, geographical barriers, pH changes, and chemotherapy [2]. Understanding the underlying genetics of intratumor heterogeneity at the single-cell level has the potential to reveal cancer therapy resistance mechanisms that are lost at the bulk level. Single-cell tumor phylogenic evolutionary trees have the potential to reveal driver mutations which can be used for targeted cancer therapies for those small populations of cells that harbor resistant mutations after treatment [12].

Fig. 9.1
figure 1

Applications of single-cell sequencing. Mechanisms behind a Intratumor heterogeneity, where different tumor cells show distinct genotypic and phenotypic variability (represented by different colored cells), b clonal evolution, or genetic diversification and clonal selection where different colored cells represent genetic changes, c tissue invasion, where mutant cells invade adjacent tissues (pink cells) with the potential to d travel to different sites, or metastasis, e crosstalk between cells with newer technologies, and f response to cancer-related therapies and subsequent clonal evolution can be elucidated by investigating the molecular signature at the single-cell level

Identification of cancer therapy resistance mechanisms at the single-cell level may also reveal novel mutations after induction of therapeutic agents (which generate clonal and subclonal populations of cells). In fact, NGS studies on single-cell mutations that drive tumorigenesis have revealed that resistance mutations vary from tumor to tumor [13]. Single-cell sequencing also has the potential to illuminate the mechanisms behind metastatic dissemination. For instance, several groups have used circulating tumor cells (CTCs) to study genomic and transcriptomic data from metastatic colon cancer, lung adenocarcinoma, and melanoma [2].

Bulk tumor gene expression studies are composites of transcriptional changes of heterogeneous cell populations; however, analysis of tumor cell expression at the single-cell level expands average tumor expression profiles of specific cell types, including non-malignant stromal, immune, and tissue-specific cells. In addition, single-cell transcriptomics aids in the detection of novel variants after treatments that may potentially drive drug resistance or serve as biomarkers of therapeutic success [14]. Furthermore, single-cell sequencing can detect low abundance of expression and/or novel RNA variants that are not detectable in bulk cell populations.

2 Single-Cell Sequencing Modalities

2.1 Sample Type and Preparation

Current single-cell technologies assay a single cell’s gene expression, DNA variation, epigenetic state, and nuclear structure (see Table 9.1). In order to analyze genomics at the single-cell level, cells need to be isolated from extracellular matrix and cell–cell adhesion for downstream processes. One of the major limitations of single-cell sequencing from solid tissues includes unbiased disaggregating of the tissue into a suspension of single cells [15]. This is important as preferential processing or lysis of one cell type over another may skew data generation and thus analysis of results.

Table 9.1 Comparison of single-cell sequencing modalities

Single cells can be obtained from virtually any tissue, and current research focuses on circulating blood cancer cells, solid tumors, or circulating tumor cells (CTCs). Single cells from solid tumors can be obtained either following surgical removal, sampling/biopsy of the primary tumor or other organs with overt metastasis, or bone marrow aspiration. Solid tumors, especially in the invasive metastatic stages, are also known to shed cells into systemic circulation. These cells in the patient’s blood stream are known as circulating tumor cells (CTCs), and those that disseminate to distant organs are termed disseminated tumor cells (DTCs).

Different methods for isolating single cells of interest from a suspension have been developed. Early studies used manual methods of cell isolation, using specialized pipettes or micromanipulation devices to isolate single cells [17]. This method of single-cell isolation has low throughput but can be used when a small number of cells are to be analyzed [22]. Other methods include fluorescence-activated cell sorting (FACS), magnetic-activated cell sorting (MACS), laser capture microdissection (LCM), and microfluidics, all of which can generally be used for larger numbers of cells. Microfluidics is a common method of single-cell isolation and allows for high throughput investigation of complex cellular systems using nanoliters of material. Microfluidics technologies isolate and encapsulate single cells in reaction chambers or droplets followed by standardized and automated nanoliter reactions, including barcoded sequence library prep for RNA and DNA sequencing. Commercially available microfluidic devices include Fluidigm C1 system, the 10× Genomics Chromium, and Illumina Biorad SureCell system [23].

2.2 Single-Cell DNA Sequencing

Whole genome amplification (WGA) followed by DNA sequencing identifies the underlying genetics and mutation frequencies of a single cell. Various methods for DNA whole genome amplification (WGA) at the single-cell level are available (multiple displacement amplification (MDA), PCR, or combination of both, see Table 9.1). These methods can now achieve >90% coverage of a single-cell genome, and mutations can be detected at a single base-pair resolution [17, 2]. Technical challenges remain, however, including effective isolation and lysis of single cells, uniform amplification of whole genome, quality assessment of single-cell amplified genomes, sequencing library preparation, and data analysis [18].

Detection of point mutations or base substitutions in single cells has to be discriminated from polymerase base infidelities and sequencing errors. Such errors can occur during the WGA process, including allelic dropout (one allele is not amplified), transcripts can be over or under amplified, false-positive errors due to the infidelity of the DNA polymerase, and uneven amplification [2, 12]. By nature of the amplification process using DNA polymerases, errors that occur in the initial rounds of amplification are then inherited by all subsequent molecules [17]. Single-cell genomewide DNA sequencing is more challenging than single-cell transcriptomics due to the fact that there is simply less template available for single-cell genomics with DNA sequencing. Whereas single cells contain thousands of copies of each mRNA molecule, there are only two copies of each chromosome (or gene for that matter), and therefore only two template DNA molecules for WGA reactions.

2.3 Single-Cell RNA Sequencing

Single-cell RNA sequencing essentially reveals the transcriptional status at the single-cell level. Whole transcriptome sequencing, or RNA seq, where exclusively messenger RNA is assayed from single cells, is the most widely used method of single-cell analysis [24, 17, 25, 26]. It measures global gene expression by reverse transcription of mRNA into cDNA, and downstream sequencing libraries are made of hundreds to thousands of individual cells (Fig. 9.2). Gene expression is measured directly by counting the number of reads or the unique molecular index (UMI) that originate from each gene in a single cell. RNA seq of single cells achieves greater sequencing resolution than cell populations at the cost of less coverage [14]. The challenge with single-cell transcriptomics, as with most of the sequencing methods, is the “noise” generated from such experiments [27, 28, 12]. For example, biological variation is derived from genetic, epigenetic, environmental, and cellular factors. Technical noise can be introduced in the course of processing from sample handling, cell isolation, reverse transcription, cDNA amplification, sequencing, and analysis.

Fig. 9.2
figure 2

Workflow of the single-cell RNA sequencing process. Starting from harvesting tumors from patients or model organisms, fresh tissues are rapidly broken down into single cells, which are the input material for the 10× Genomics pipeline to generate sequencing libraries from single cells. Finally, the libraries are sequenced and data is analyzed to elucidate the cellular heterogeneity and biology of individual tumors

New techniques and methods are continuously been developed and reformed to limit the aforementioned challenges [29]. A crucial step of single-cell RNA seq is the unbiased amplification of cDNA before sequencing [27, 30, 28]. The use of unique molecular identifiers (UMIs), which bar codes each molecule, allows a robust quantification by intercepting amplification. Cell throughput is high, and the use of unique molecular identifiers to barcode individual transcripts also helps distinguish heterogeneous gene expression differences [19]. Microfluidics uses a process of capturing cells within nanofluidic chambers and has considerably improved sensitivity for mutation detection by minimizing allelic dropouts [19, 12]. Drawbacks include reduced sensitivity such that only the 10–20% of the most abundant transcripts can be quantitated. In addition, reliable amplification and deletion of transcripts expressed at less than 10 copies per cell is a challenge and can lead to inaccurate quantification of low abundancy transcripts [27, 28, 2]. Doublets, or cells that share the same UMI and bar codes, can also occur during sample processing; therefore, it is important to validate true single-cell capture before subsequent analysis.

2.4 Single-Cell Epigenetic Analysis

Whereas transcriptomics can be described as tracking “output” signals of a given genetic locus, such as a protein encoding gene, the analysis of chromatin, and epigenetic changes may be thought of as tracking the “input” signals of the locus in question (Wills 2015). With advances in technology, it is now easier to probe epigenetic phenomenon at the single-cell level including single-cell analysis of DNA accessibility, methylation status, histone modifications, and chromosome conformation by bisulfite sequencing, DNAase I hypersensitivity sequencing, ATAC-seq, and single-cell Hi-C, with the latter methods being the most developed for single-cell sequencing approaches [21, 31, 32, 33, 34, 35]. Challenges associated with these techniques are very similar to those of single-cell DNA sequencing.

Whole genome bisulfite sequencing assays identify DNA methylation (CpG islands). Chromatin immunoprecipitation sequencing (ChIP-seq) has yet to be adapted for single-cell sequencing; however, a method termed single-cell Hi-C analyzes active chromatin domains in cell nuclei [35]. This method measures proximity between sites in the genome in three dimensions, producing a “contact map” that can be used to identify looping interactions between regulatory elements and gene loci [33, 35, 15]. Single cell assays for transposase-accessible chromatin with sequencing (ATAC-seq) is another method used to identify open chromatin regions which, in cancer cells, are often associated with oncogene expression and represent sites with increased vulnerability to mutagenic assault [20].

2.5 Other Single-Cell Analysis Techniques

One weakness of current single-cell techniques is that it requires analysis of single cells in suspension, which does not capture information pertaining to cell-to-cell interactions [2, 12]. Moreover, tumor clones evolve dynamically in space and time, and single-cell samples from an individual tumor may reveal mutations that are clonally dominant but may not be apparent in other regions of the tumor [2]. This has been partially overcome by spatiotemporal dynamics (discussed further below) which can be obtained by serial sampling of the same patient and provides information about evolution of a tumor through time (this is easier for liquid vs. solid tumors), multiple anatomically distinct biopsies for intratumor heterogeneity, in situ sequencing and imaging techniques for spatial resolution, or laser capture microdissection, which aids with information lost with cell-to-cell interactions [12].

Finally, new methods that integrate different single-cell genomic approaches and functional assays have recently been developed and include simultaneously measuring two or more modalities, whether it be genome and transcriptome, transcriptome and methylome, or RNA and protein [36, 17]. The ultimate goal of linking phenotypes of cells and their genotypes includes validation of gene expression and further development of precision medicine [36]. There are many different approaches available, some of which include gDNA-mRNA sequencing (DR-seq), genome and transcriptome sequencing (G&T-seq), single-cell DNA methylation analysis through bisulfite sequencing (scBS-seq) and reduced representation bisulfite sequencing (scRRBS-seq), as well as single-cell methylome and transcriptome sequencing (scTrio-seq) [37]. When extracting multiple “omic” datasets from individual cells, there are similar quality compromises as discussed previously [12].

3 Data Analysis

Single-cell measurements preserve crucial information that is lost in bulk assays. Statistical and computational methods are critical to extract meaningful information from the data [36]. Single-cell analysis is based on the analysis of a cell modality (genetic variations, cell expression profile, changes in chromatin conformation, etc.), compared with some critical threshold. This threshold depends on the variability that exists in the assay as well as biological variability [15]. Statistical models can account for this variability.

Whole genome and whole transcriptome amplification as well as sequencing data are more difficult to analyze at the single-cell level than in bulk experiments [38, 39, 40, 1]. Bulk experiments have dozens of samples, and genome measurements cannot distinguish between fluctuations due to changes in gene regulation versus shifts in the ratio of different cell types. For single-cell genome amplification experiments, DNA is extracted from millions of cells, with intermixed sequences from different tumor clones, as well as normal cells [17]. Single-cell expression measurements can be variable; therefore, separating technical variability from biological variability is essential. Computational methods can help determine which mutations are clonal (present in all tumor cells) and which are subclonal. Point mutations and copy number data can be further analyzed with bioinformatics algorithms, and phylogenetic trees of different tumor subclones can be inferred. Analyzing expression levels in properly grouped subpopulations of cells allows a more accurate measurement of expression among different cell subpopulations [41, 15].

One of the major tasks of single-cell RNA seq data analysis is the resolution of cellular heterogeneity (Fig. 9.3). Most pathological samples such as tumor tissues consist of multiple cell types. Tumor tissues usually contain primary tumor cells, stromal cells, endothelial cells, and immune cells recruited from the peripheral blood and lymphatic organs. They are derived from different cellular lineages and play different roles in the tumor initiation, progression, and metastasis, which makes a tumor sample very complex and difficult to investigate. Before looking deeper into the tumor clonal variation, the tumor and its microenvironment should be resolved first. By identifying signature genes expressed in each cell type, one can further delineate the states of different cell types.

Fig. 9.3
figure 3

Single-cell RNA sequencing data analysis. a A representative example of a three-dimensional tSNE rendering of scRNA seq data for a mouse pancreatic tumor is shown. Each dot depicts a single cell, and colored clusters represent distinct cell types identified in the tumor. b Quantification of each individual cell type is shown with five different most commonly observed cell types identified in this tumor. CAF: cancer-associated fibroblasts; EMT: tumor cells undergoing epithelial to mesenchymal transition; ENDO: endothelial cells; ETC: epithelial tumor cells; TAM: tumor-associated macrophages

Understanding the cellular composition and tumor evolution can facilitate the evaluation of the tumor state and aid treatment decisions. Given the assumption that biological variability is larger than the technical variability, it is generally believed that similar cell types will cluster together by the cell type-specific feature expressions. The dynamic change of a specific cell type from one state to another sometimes could also be captured by such metric, though it is very challenging.

The calculation of the similarity between cells is usually based on multiple gene expression features. The relative position of these cells in a high-dimensional space is not easy to comprehend. Principal components analysis (PCA) is one of the most commonly used algorithms for reducing the dimensionality of data. Nonetheless, for single-cell RNA seq data, the first two to three components of PCA analysis cover a very small proportion of variance and therefore make the clustering effect less representative than for those other gene expression datasets [42]. A more advanced, nonlinear dimensionality reduction algorithm called t-Distributed Stochastic Neighbor Embedding, or t-SNE [42], has been proposed to explore high-dimensional single-cell RNA seq data. t-SNE itself does not necessarily define the similarity metric between cells but makes it visualizable in low-dimensional space. This algorithm is being widely used in part because it can be adapted to many visualization tools that are easily understood and interpreted by non-computational biologists (Fig. 9.3). The relationships across cell types can be inferred by the high-dimensional calculation and explored by the low-dimensional visualization.

The single-cell clustering and trajectory method of analysis can be used to define cell types and stable cell states. For example, clustering of three different cell types from one tumor by gene expression can group single cells into those of invasive tumor cells, noninvasive tumor cells, and stromal cells. When clustering, or grouping cells by gene expression, each cell represents a point in space based on expression of genes (approximately 30,000 genes). Clustering is based on the measured distances between points, and cells are grouped based on mutual proximity.

Besides resolving the cellular composition of heterogeneous tissues, single-cell studies also aim to characterize genes in such a way so that presumptions can be made of where a cell is in time and what drives a cell to transition from one state to another [15]. Using current methods, cells are analyzed at a particular state and time. The intact sample is destroyed by, for example, cell lysis (in the case of RNA seq and qRT-PCR) or cell fixation (in the case of FISH). Time series experiments extrapolate cell transition states through time; however, they misconstrue results by averaging cell expression as they proceed through a biological process in unsynchronized manner [43, 15]. Any particular sample at a given time-point contains cells of varying stages of cell growth and transition, reflecting the underlying dynamics of transitional state relevant genes. Single-cell analysis can define genes that are differentially expressed during these transitional states. In order to recover true signal of relevant expression, cells are re-ordered in something called “pseudotime” according to biological progress (for example, percent cells differentiated instead of time). Two algorithms are currently available based on pseudotime, Wanderlust and Monocele, both of which attempt to define those genes responsible for cell transition [43, 15].

4 Application of Single-Cell Analysis in Precision Cancer Therapy

The translational application of single-cell sequencing in precision cancer therapy has the potential to improve cancer diagnostics, prognostics, targeted therapy, early detection, and noninvasive monitoring [2]. It is now technically and economically feasible to sequence single-cell DNA and RNA. Single-cell sequencing allows highly sensitive detection of rare mutations and cell-specific gene expression profiles. This method can identify rare tumor tissue variants that have the potential to drive drug resistance or serve as biomarkers of therapeutic success and ultimately advancing cancer genomics [14].

The importance of single-cell techniques in the clinical setting can be illustrated in tumor sampling. A single sample from a tumor does not represent the tumor as a whole. Spatially separate samples from a single tumor (or elsewhere in the body from metastasis) is composed of varying proportions of cell types and/or diverse underlying genetic and epigenetic makeup, otherwise known as tumor heterogeneity. Greater tumor heterogeneity may predict poorer response to therapy, higher probability of metastasis, or poor overall survival [2]. Identification of founder mutations, constructed from tumor phylogenetic trees, may aid in prediction of response to treatment.

Sequencing at the single-cell level can detect low abundance mutations, facilitating the identification of drivers of drug resistance. Drug resistance dynamics have been previously modeled in metastatic breast cancer cell line using RNA seq technology [14]. When metastatic breast cancer cells were treated with paclitaxel, stressed cells arrest and die, whereas those rare drug-tolerant cells resume proliferation and their clones expand. The ability to profile both the genome and transcriptome of the same cells has potential to elucidate heterogeneity at the genome, epigenome, transcriptome level.

Drug development is a lengthy and expensive endeavor with a high failure rate [10]. Drug development includes many steps: identification of drug targets, candidates, assessing drug resistance, drug toxicity, and pharmacokinetics. Many drugs emerge from preclinical studies only to fail in clinical trials. NGS has identified new target candidates for drug development. Single-cell sequencing in drug development expands on bulk genomic data by offering a more thorough and comprehensive picture on the underlying genetics, epigenetics, and transcriptomics of responders versus non-responders at an individual cell level. This ultimately allows for improved efficiency, accuracy, and identification. Applications of single-cell sequencing in drug development include identification of drug candidates and drug targets, drug resistance, and drug responses and toxicities [10].

Single-cell sequencing has shown potential to advance early detection and noninvasive monitoring. This concept is being elucidated by studies on circulating tumor cells (CTC), ultimately providing insight on metastatic dissemination (Navin 2015). CTC studies also aid in understanding evolution of the genome in early stages of cancer by identifying clones (and their underlying genetics, transcriptomics, etc.) that invade surrounding tissue.

Algorithms like Wanderlust and Monocle allows one to reconstruct transcriptional dynamics of development, differentiation, and/or clonal evolution from single-cell transcriptome data. Given such insight, we look forward to single-cell sequencing’s ability to identify signature transcriptions of tumor states, which will strongly facilitate treatment decisions and healthcare strategies for patients.

5 Perspectives

Single-cell sequencing will transform cancer research over the coming years as even initial experiments have revolutionized our current understanding of gene regulation and disease. Indeed, the data available with single-cell techniques has never been possible before. Since the initial single-cell sequencing experiments, there have been many technical and experimental advances and the field continues to advance at a remarkable speed.

Drawbacks to single-cell sequencing include loss of tumor characteristics including spatial information, intratumor heterogeneity, and important cell-to-cell interactions. This issue stems from the fact that single-cell preparation and isolation capture techniques require intact single cells to be dissociated from fresh tissues. Most single cells are derived from a biopsy or small piece of tissue; therefore, single-cell sequencing may not accurately represent the underlying genome/transcriptome of the original tumor. Even the process of dissociating single cells from tissues may alter the cells and their underlying gene expression. In addition, microfluidic devices lose entire cell populations and may have bias for certain cells sizes, which, along with inherent weaknesses of selective amplification, can skew results.

With increasing amounts of complex data generated by single-cell sequencing techniques, there exists the dilemma of accurate interpretation and what to do with the sheer quantity of data generated. While many tools for analysis have been developed, there is a need for further analytic improvement in filtering noise and scalability [43]. Some of the above issues, particularly spatial information, can be overcome by single-cell analysis techniques, but this highlights the need for skilled bioinformaticians to accurately analyze the data. In addition, there is currently no universal analysis technique available, allowing for potentially more bias. Another challenge that limits the wide application of whole genome single-cell sequencing in the research and clinical settings is its relatively high cost compared to other bulk sequencing techniques. For example, a single-cell RNA seq experiment using the Chromium System from 10× Genomics currently costs at least 10 times higher than a typical bulk whole genome RNA seq experiment. However, new and cheaper techniques are being developed which will greatly improve the accessibility of the technology so as time goes on.

Emerging new technologies that combine single-cell sequencing with other techniques acquire even deeper and richer genomic/biological information of cells and tissues. Spatial transcriptomics, which integrates single-cell RNA seq with the in situ hybridization (ISH), is one of such new technologies. It analyzes intact tissue sections on slides and does not require the need for cell isolation from tissue. The process involves the placement of histological sections on slides that contain reverse transcription primers with unique positional bar codes and subsequent placement of millions of oligonucleotides in micrometer subsections. This is followed by reverse transcription [23]. This method has the potential to overcome the loss of spatial information, intratumor heterogeneity, and the potential alteration of cells during the process of dissociation. Application of such technologies in clinical samples could potentially revolutionize patient care.