Introduction

Pluripotent stem cells (PSCs) such as embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs) possess the potential to generate all of the cell types that are found in an embryo. Given this potential, patient-derived iPSCs and their differentiated counterparts are convenient disease models in basic research. They are also useful for evaluating the efficacy and safety of novel drugs on human cells well before they tested in expensive clinical trials. Additionally, PSCs are promising agents for therapy or for the regeneration of cells and organs that are damaged by disease or injury.

As increasingly clinical applications for PSCs are being investigated, reprogramming technologies and culture conditions are being developed for the derivation and maintenance of suitable PSC lines. Episomal vectors, non-integrating viruses and reprogramming mRNAs have been developed to obtain footprint-free iPSCs while chemically-defined culture media have been used to keep the cells under xeno-free conditions [13]. Furthermore, methods have also been developed for the successful reprogramming of lymphocytes that are readily obtained from patient blood samples [4, 5].

The rapid development of PSC technologies has led to the creation of numerous PSC lines that differ in origin, derivation, passage number, and culture conditions, all of which have been reported to have an impact on the cells’ self-renewal properties, differentiation potential, gene expression, genetics, epigenetics and karyotype [611]. Such diversity in methods and cell lines necessitates the routine and comprehensive characterization of each line in order to verify its pluripotency, quality, and suitability for its intended purpose.

The complete characterization of PSCs, starting from the time they are derived to the time they are used, is best described in terms of iPSCs (Fig. 1), which are now being generated at a much faster pace than ESCs. Early during reprogramming, iPSCs are first characterized in order to accurately identify, pick and expand the clones-of-choice. It is critical that the methods used for this are rapid, easy and preferably do not affect cell viability or growth. These constraints limit the characterization options to observing colony morphology and checking PSC surface antigen expression or Alkaline Phosphatase (AP) expression using live cell protocols [1215]. As fewer clones are selected and expanded, more detailed analyses are performed: First, the undifferentiated cells are checked for expression of self-renewal markers, often through immunostaining or quantitative-reverse transcription-polymerase chain reaction (Q-RT-PCR) and more comprehensively through high-throughput transcriptome and epigenome analyses [16, 17]. Secondly, the cells are verified to spontaneously differentiate into the three germ lineages, most commonly through in vitro embryoid body (EB) formation or more definitively through in vivo teratoma formation [18, 19]. With the combination of self-renewal marker expression and trilineage differentiation potential, the cells are confirmed to exhibit functional pluripotency. Subsequently, the cells are karyotyped to rule out behavior-altering genetic abnormalities that may have been acquired as a result of their transition to pluripotency or exposure to particular reprogramming methods [8, 2024]. Together, the tests to confirm functional pluripotency and normal karyotype are essential for documenting the characteristics of newly established PSC lines [25]. An example of such characterization is shown in Fig. 2.

Fig. 1
figure 1

Diagram of the iPSC characterization workflow starting from the reprogramming process. Common characterization studies are found in solid boxes, while less common analyses are in dotted boxes

Fig. 2
figure 2

Representative characterization panel for a newly derived iPSC line. The minimum requirement for the characterization of a newly derived iPSC line is to show the expression of multiple self-renewal genes through cellular and molecular analyses, to demonstrate the ability to differentiate into the three germ lineages, and to confirm normal karyotype. [A] In this panel, a newly-derived feeder-dependent iPSC line is characterized using a variety of self-renewal markers. Live iPSCs are stained using a fluorescent substrate for AP or using antibodies against the positive surface markers SSEA4 and TRA-1-60 and the negative surface marker CD44. Images are shown as a merge of the phase contrast with the fluorescence channels. Fixed iPSCs are stained with antibodies against the intracellular self-renewal markers NANOG, OCT4, and SOX2, along with DAPI as a nuclear stain. [B] Additionally, iPSCs are allowed to form EBs and differentiate spontaneously for 21 days, then tested for the ability to generate the three germ lineages. The Day 21 EBs are stained with DAPI and antibodies against Alpha Fetoprotein (AFP) for endoderm, Smooth Muscle Actin (SMA) for mesoderm and Beta-III-Tubulin (TUJ1) for ectoderm. [C] For the molecular analysis, the feeder-dependent iPSCs are used to make feeder-free cultures and Day 7 EBs. Gene expression is analyzed using the TaqMan® hPSC Scorecard™ Panel and is compared against a known set of reference samples through a data analysis software based on Bock et al. [7, 126]. [D] Finally, the iPSCs are sent to Cell Line Genetics for G-banding and are found to have a normal male karyotype in 20 cells analyzed (46,XY[20])

Since pluripotency, differentiation capacity and genetic stability may change due to adaptation to passaging and culture conditions [2123, 26], testing for functional pluripotency and karyotype are also important for monitoring the quality of established PSC lines as they are maintained. On the other hand, other PSC characterization assays like genetic profiling, HLA typing and microbial testing are used to determine cell identity and ensure cell quality and safety. These tests are particularly critical when cell lines are intended for banking or therapeutic use.

In summary, cell characterization is done continually throughout the stem cell workflow, although the specific assays will vary according to the situation. This review provides an overview of current practices and challenges encountered in the characterization of PSCs. It also discusses some of the trends that have been observed in the field.

Determining Pluripotency

Observing Morphology

Human PSCs grown on feeders exhibit a distinct morphology where cells have a prominent nucleolus, exhibit high nucleus to cytoplasm ratio, and form tightly organized colonies with defined edges [15, 27]. A detailed study by Wakao, et al. [28] described the ideal feeder-dependent iPS colony quantitatively as being comprised of a single layer of cells at a density of 5900 cells/mm2, where each cell is around 43.5 μm2, and exhibits a single nucleolus, a nucleus to nucleolus ratio of about 2.19, and a nucleus to cytoplasm ratio of about 0.87. Measurement aside, an expert may quickly and inexpensively assess cell and colony morphology to distinguish good colonies from unreprogrammed, differentiating, or unhealthy colonies that have rough edges or contain larger and flatter loosely organized cells [13, 15]. The difficulty with relying solely on morphology is that it requires significant experience in inspecting PSC cultures, and PSCs cultured under different conditions can have slightly different morphologies [2931]. To provide valuable references for researchers still learning to judge PSC morphology, stem cell labs and banks may keep representative images of ideal colonies for each cell line, along with the undesirable areas of differentiation [17]. Despite the availability of such resources, the assessment of morphology is best used in combination with other methods, such as the assessment of self-renewal markers, rather than as the sole basis of PSC characterization when performing critical experiments.

Analyzing Self-Renewal Markers

Since the first derivation of hESCs and hiPSCs, many studies have utilized transcriptome analyses to compare different lines to their differentiated counterparts [32, 33], pooled total RNA [34, 35], embryonal carcinomas and cancer cells [36, 37], as well as mouse ESCs [33, 36]. Such global surveys have shown that hESCs and pluripotent embryonal carcinomas are similarly enriched for hundreds of genes, including OCT4, FLJ10713, DNMT3B, FOXD3, SALL2, GABRB3, and TDGF1, suggesting that these genes play a role in maintaining pluripotency [37, 38]. A number of miRNAs like miR371 and miR302 have also been found to be high in PSCs but low or absent in differentiated cells, implying that the regulation of pluripotency is not limited to protein-coding genes [3942]. Finally, hESCs and mESCs are similarly enriched for genes like OCT4, SOX2, NANOG, REX1, TDGF1, and FLJ10713, indicating that mechanisms for pluripotency are conserved to some extent [33, 36]. These comparisons have led to a better understanding of pluripotency and the publication of the aforementioned gene expression signatures have paved the way for the development of in silico tools to predict pluripotency based on transcriptome data, as discussed later. Moreover, they have revealed potential markers that may be individually detected to positively identify hPSCs, either through cellular methods (Table 1) or molecular analyses (Table 2).

Table 1 Methods for Cellular Analysis of Self-Renewal Markers
Table 2 Methods for Gene Expression Analysis

The International Stem Cell Initiative (ISCI) [16] suggested a core set of markers that had already been verified to play a role in pluripotency or had been reported to be consistently expressed in 59 hESC lines, including NANOG [43, 44], TDGF [45], OCT4 [46], GABRB3 [32], GDF3 [36], DNMT3B [36, 37], the keratan sulfate antigens TRA-1-60, TRA-1-81, and the glycolipid antigens SSEA3 and SSEA4 [4750]. Beyond these markers, PSCs may also be identified through the detection of SOX2, REX1, and hTERT expression, as well as AP activity [12, 14, 51]. The same markers can be used to identify emerging iPSCs except for OCT4 and SOX2, which are artificially overexpressed to initiate reprogramming [52]. SSEA4, AP, NANOG, GDF3 and hTERT have been detected in partially and fully reprogrammed clones while TRA-1-60, DNMT3B, and REX1 have been shown to be more stringent markers for fully reprogrammed iPSCs [51]. During the directed differentiation of hESCs into the three germ lineages, OCT4, SSEA3 and TRA-1-60 have consistently been observed to be downregulated faster than AP and NANOG [53].

A wide variety of methods are used to individually detect self-renewal markers, including fixed-cell chromogenic staining or live fluorescent substrate-based staining for the detection of AP activity [12, 14]. Q-RT-PCRs and endpoint RT-PCRs are preferable for generating quantitative or semi-quantitative results and are particularly useful when antibodies are not available. On the other hand, antibody staining is suitable for cellular analyses and is applicable for imaging and flow cytometry. Unfortunately, immunostaining of intracellular proteins like OCT4, NANOG and SOX2 requires cell permeabilization and the termination of the culture. Interestingly, a recent report described the identification of PSCs by non-invasively observing blue fluorescence generated by endogenous retinoid-sequestering lipid bodies [54]. This fluorescence was said to correspond with the expression of OCT4, SOX2 and NANOG, though it was also found to be limited to primed PSCs.

Given that limitation, a preferable alternative for live cell detection of intracellular markers may involve the use of molecular beacons. Molecular beacons are short single-stranded oligos that bind to the complementary mRNA-of-interest, allowing the attached reporter to fluoresce [55]. In the absence of this target, the oligo self-anneals into a stem-loop structure, causing the quencher on the opposite end of the oligo to quench the fluorescent reporter [55]. Molecular beacons have been successfully used for the detection of OCT4 in hESCs [56], and the flexibility in synthesizing different oligos opens up the possibility of detecting many more markers, including non-protein coding RNAs. However, it must be noted that molecular beacons need to be delivered into the cells, and depending on the method employed, transfection may be inefficient or even toxic to the cells.

The detection of individual markers using the above methods is not sufficient to identify PSCs. Many genes are expressed in multiple tissues, as illustrated by the expression of SOX2 in neural progenitor cells along with PSCs [57]. Moreover, these genes may have significantly different splicing variants like OCT4, which has an A isoform relevant to pluripotency and a B isoform that is not [12]. Therefore, the self-renewal markers are best analyzed in combination using validated detection methods. Today, antibody and Q-RT-PCR panels for PSC identification are offered by a variety of companies. Interestingly, OCT4, NANOG and SOX2 must be in a specific equilibrium to induce or maintain pluripotency, meaning that even the presence of the three genes does not necessarily guarantee that a cell is a PSC [58]. Indeed, using more markers increases the likelihood of identifying the cells correctly, and this has driven the trend towards using multi-gene expression panels.

That said, positive marker panels, even large ones, only inform the investigator of the presence or absence of pluripotent stem cells; they fail to distinguish partially reprogrammed cells or detect contamination by differentiated cells [12, 30]. Accordingly, there has been a trend in the validation and use of negative markers like SSEA1, A2B5, CD56, GD2, GD3, CD13, and CD44 that can be used together with positive markers [16, 48, 59, 60]. Flow cytometry with combinations of positive and negative markers is useful for monitoring the progression of the reprogramming process [5961].

Evaluating Differentiation Potential

Spontaneous Differentiation via EB Formation

Aside from the analysis of self-renewal markers, one of the key tasks in characterizing a PSC line is to functionally confirm its ability to give rise to the three germ layers: the ectoderm, mesoderm and endoderm. In mice, it is possible to do this directly by testing chimera formation, germline transmission, and tetraploid complementation in vivo [62]. It is not possible to conduct similar experiments with human PSCs. As such, scientists have turned to in vitro assays involving random differentiation with EB formation or directed differentiation. However, the closest alternative to chimera assays in the hPSC field is the teratoma formation assay. These three types of differentiation assays are discussed here.

EB formation refers to the process in which PSCs deprived of pluripotency signals grow and start to differentiate in suspended aggregates [18, 63]. EBs can be graded as simple EBs, which are solid aggregates of cells that appear similar to the morula, or they can be large cystic EBs that develop a lumen similar to a blastocyst. They are formed in a number of ways which are reviewed by Kurosawa [64]. One way involves scoring PSC colonies or using mild trituration to generate smaller cell aggregates, and culturing them in a non-adherent polystyrene dish [12]. This is the simplest method for generating EBs, but it results in aggregates with heterogeneous shapes and sizes that lead to asynchronous differentiation and development [65]. For mPSCs, the hanging drop method has enabled the generation of EBs with homogeneous and defined sizes. This method involves harvesting the mPSCs as a single-cell suspension and allowing a defined number of cells to collect and cluster at the bottom of the drops. The fact that hPSCs are susceptible to apoptosis after dissociation has prevented the adoption of this method, but ROCK inhibitors are now being used to improve hPSC single cell survival [66]. More recently, different groups have explored EB formation through forced aggregation, where single-cell suspensions are placed in round-bottom wells or triangular microwells and cells are allowed to collect through gravity or centrifugation [67, 68].

After growing in suspension, the fully-formed EBs may be used for random differentiation or for directed differentiation. In random differentiation, the EBs are kept in suspension or transferred onto an adherent surface, and are grown without FGF2 over 7 to 21 days [69]. Trilineage differentiation can then be confirmed through immunostaining or RT-PCR of lineage-specific or differentiated cell-specific markers. Some commonly used examples include TUJ1 for ectoderm, SMA for mesoderm and AFP for endoderm, although AFP is also known to be expressed by visceral endoderm [60].

It is a matter of concern that there is currently no standard method for generating, differentiating, and analyzing EBs since different methods, media compositions, and EB sizes can influence the differentiation trajectory [64, 65]. For example, the forced aggregation method in triangular microwells simulates hypoxic conditions and tends to favor cardiac differentiation [67]. This favored trajectory has been exploited to direct stem cell differentiation into cardiomyocytes [70]. EBs generated through various means have also been used for the directed differentiation of pancreatic cells [71], osteoclasts [72], skeletal muscle cells [73], and dopaminergic neurons [74].

Directed Differentiation

Directed differentiation protocols tend to involve a series of treatments simulating the signals that a cell receives during successive stages of development, starting from germ lineage induction, followed by patterning or specification. For example, one widely used protocol for inducing PAX6+ neural ectoderm with >80 % efficiency involves inhibiting the BMP and TGFB pathways in a PSC monolayer using Noggin and the small molecule SB431542 [75]. While the original protocol predominantly generates cells of the central nervous system, additional WNT pathway activation using the small molecule CHIR99021 induces SOX10+ neural crest cells. Subsequent BMP4 and EDN3 treatment specifies KIT+ and MITF+ melanoblasts, and following up with SCF, EDN3, FGF2, CHIR99021, cyclic AMP, BMP4, and B27 allows differentiation into TYR+ OCA2+ melanocytes [76, 77]. Similarly, IDE1 or IDE2 molecules can be used to activate the TGFβ pathway, inducing SOX17+ definitive endoderm up to 80 % efficiency [78]. These cells can be further differentiated into PDX1+ pancreatic progenitors using the ILV small molecule to activate PKC signaling [79]. Finally, for cardiomyocytes, one protocol induces KDR+ PDGFRA+ mesoderm from EBs using Activin A, BMP4 and FGF2; specifies the cardiac lineage through treatment with VEGFA, DKK1, SB431542 and dorsomorphin; then triggers cardiomyocyte differentiation using VEGFA and FGF2 [80, 81]. Many more protocols have been published recently for differentiating a variety of ectodermal [8286], mesodermal [8790] and endodermal lineages [9196]. Some laboratories have even taken the PSC differentiation process further to generate organoids or 3D tissue structures that mimic the human tissues [95, 9799], allowing the analysis and manipulation of human tissues in a different way.

Altogether, treatments can be done using recombinant cytokines and morphogens or cheaper small molecule counterparts; starting materials may be cell aggregates similar to those used during random differentiation or PSC monolayers that are technically simpler to generate. No matter what the exact protocol is, it is necessary to have sensitive methods and definitive markers for the detection, quantification and enrichment of the desired cells. Such methods include immunohistochemistry, flow cytometry, in situ hybridization, RT-PCR and less frequently, functional assays. Through the use of these methods, it is clear that there have been steady improvements in the efficiency, specificity, and functional maturity achieved by the various differentiation protocols. Such improvements are necessary if the differentiated cells are to be used for therapeutic purposes, disease research and drug screening in the future. However, in the early characterization of PSCs, directed differentiation is performed simply to check the ability of the cell line to generate a specific cell type-of-interest. Such a test is valuable because different PSC lines can have varying propensities for differentiation and varying levels of responsiveness to a particular differentiation protocol.

Teratoma Formation

The teratoma assay involves injecting PSCs into the kidney, the liver, the testis or the leg of an immunodeficient host mouse [100]. The following incubation period allows the PSCs to form a tumor and differentiate into cells from the different germ lineages. The assay often results in the formation of higher order structures through a combination of signals from the 3D environment, direct cell interactions, and exposure to morphogens [101]. Some reported structures have been recognizable as parts of the kidney and the gastrointestinal tract [27, 102, 103]. Often, tissues are also present outside of these higher order structures and are not easily distinguishable.

Histological analyses by a trained pathologist assist in the identification of tissues found in higher order structures and are particularly useful for the tissues found in isolation. However, histological analyses have a level of uncertainty in identifying the tissues and cannot differentiate donor versus host cells—a definite concern as donor and host cells have been known to form structures together. To definitively show that the donor ESCs are capable of forming the three germ layers, in situ hybridization or immunostaining has to be done, both differentiating the host versus donor, along with identifying derivatives of ectoderm, mesoderm and endoderm. Human Nuclear Antigen and human CDH1 may be used for the donor identification [104, 105]. Some of the common markers for detecting ectodermal derivatives include TUJ1, Keratin and DBH, which stain neurons, keratinocytes and adrenal cells, respectively. To detect mesodermal derivatives, one may use SMA for myocytes, CMP for bone, KLK1 for kidney, WT1 for the urogenital tract, and ACTC1 for the heart. For the endoderm, one may use A1AT or INS, which are expressed by hepatocytes and the pancreas, respectively [101]. One drawback to this approach is that many different tissue markers may need to be tested to detect the results of the random differentiation. As with self-renewal marker staining, markers for certain tissue types are rarely exclusively expressed in those cell types. Therefore, combinations of tissue-specific markers may be more reliable.

The teratoma assay has the advantage of demonstrating functional pluripotency under physiological conditions. It is therefore considered as the gold standard that is critical for showing pluripotency of human cells and has been used for the characterization of many newly derived human ESC and iPSC lines [13, 15, 27, 106, 107]. That said, less than half of the published ESC and iPSC lines have been validated using teratoma assays [19], with usage of the assay seemingly concentrated around method-pioneering studies. Among the studies that published the results of teratoma assays, methods varied greatly in terms of the number and preparation of cells, the site of injection, and the length of the incubation period. In one published protocol, 1 million cells are injected into the testis capsule of CbySmn.CB17-Prkdc SCID/J mice and incubated for 6–12 weeks, at which point a tumor is expected to have grown to an externally palpable size [108]. In a more recent protocol, 100,000 to 500,000 cells are injected subcutaneously into a NOD/SCID mouse and incubated for up to 30 weeks, with the extended incubation providing increased sensitivity for assays using as little as 100 PSCs [109]. It is clear from the latter publication that cell number and incubation period affect the results of the assay, and there is further evidence that cells can respond to their direct environment, preferring to differentiate into specific lineages depending on the injection site [103, 110, 111]. This suggests that teratoma assays are vulnerable to inconsistencies due to differences in protocol, therefore making it difficult to compare results of teratoma assays.

In 2010, Muller et al. called for the standardization of the protocol in order to boost the assay’s utility and status as the gold standard [19]. However, as described above, different protocols continue to be published and are sure to cause continued confusion. What remains consistent, however, is the fact that teratoma formation is a lengthy, time-consuming, and laborious experiment that carries the heavy cost of housing and monitoring about three host mice per line for the duration of the experiment [30]. With these prevailing issues, the utility of the teratoma assay and its position as a gold standard are being called into question, with some laboratories exploring low-burden, time-saving alternatives like in vitro differentiation assays and high-throughput profiling coupled with computer predictions [112].

Confirming Cell State

Gene Expression Profiling

Gene expression profiling is often performed to identify or confirm a cell type or cell state. For instance, it has been performed to check whether PSCs have properly differentiated into a specific target cell [113, 114]. It has also been done to support or challenge the equivalence of ESCs and iPSCs [7, 115120], as well as to test the quality of PSCs generated through different methods [13] or maintained under different conditions [116, 121]. In the latter types of studies, the iPSCs are subjected to hierarchical clustering where true PSCs are expected to cluster with ESC controls and away from the parental somatic cells [12, 30].

Most transcriptome studies are performed with microarrays, interrogating thousands of genes using predetermined sets of probes [3235, 37, 116, 119]. Other useful high-throughput gene expression assays for PSCs include serial analysis of gene expression (SAGE) and massively parallel signature sequencing (MPSS), which involve tag-based sequencing of up to millions of transcripts and creating gene expression libraries that can be compared with existing ones [26, 36, 122]. Unlike microarray analyses, SAGE and MPSS enable the analysis of uncharacterized or novel genes or splice variants. More recent studies have begun using RNA-sequencing for deep-sequencing of transcripts [2, 123].

The high density assays above analyze thousands of genes, allowing the observation of unexpected differences in gene expression and the broad screening of PSC-related genes (Table 2). However, they require significant resources and many of these genes turn out to be irrelevant to stemness or differentiation. A cheaper alternative would be to use lower density Q-RT-PCR panels and focused arrays, selecting only tens or hundreds of genes with known PSC functions or with robust correlation to PSC identity in various studies (Table 2) [7, 16, 106, 124]. These assays maximize the PSC-related information and minimize the required resources, but they may do so at the expense of the big picture and the increased certainty offered by high density assays. Similar to the high density assays, an unknown sample must be compared to known pluripotent and non-pluripotent controls in order to verify pluripotency.

In silico tools have been developed to analyze gene expression profiles and directly predict sample pluripotency. PluriTest™, a first generation open access tool, defines the PSC phenotypes based on high density microarray data from hundreds of PSC and non-PSC samples generated by different labs. Through a machine-learning method, it analyzes transcripts that distinguish pluripotent and non-pluripotent cells, as well as transcripts that deviate from the expected PSC gene expression pattern. PluriTest™ thus assigns unknown samples with ‘pluripotency scores’ and ‘novelty scores’ that are graphed in relation to the expected range for PSCs [125]. Similarly, Bock et al. [7] employ a bioinformatic algorithm to compare putative PSCs to well-characterized PSC references, noting genes expressed outside the expected range for PSCs and measuring each sample’s propensity for differentiating into the three germ lineages. The results of the comparison are summarized on a PSC Scorecard that could be used as the basis for excluding low quality cell lines that deviate from the references [126]. This principle is used in the Q-RT-PCR-based TaqMan® hPSC Scorecard™ Assay that targets a focused set of self-renewal and differentiation genes [126]. As a complementary tool to assess cells differentiated from PSCs, Cahan et al. [127] recently developed CellNet, a network biology platform that evaluates the correspondence between an engineered cell’s transcriptome and the target cell type’s gene regulatory network.

In silico tools are rapidly gaining favor because they are faster and cheaper than the in vivo pluripotency assays. Moreover, they bear no animal burden and provide a wealth of information that can easily be stored and reanalyzed as improved algorithms are developed [30]. However, there are hurdles that will need to be overcome before in silico models can completely supplant teratoma assays. Pioneering work from the field of cancer research has shown that the key to developing reliable in silico prediction tools lies in the quantity and quality of the reference samples used to develop the predictive models [128]. To improve the current models, large amounts of PSC and non-PSC data will need to be amassed and analyzed. To facilitate this process, Williams et al. [128] suggest that independent experiments be designed in a way that will allow the resultant data to be integrated into large datasets. This may be a challenge given the wide variety of assays available and the rapid development of analysis platforms, including next generation technologies like RNA-sequencing. The stem cell community is already taking steps in the right direction with the creation of the Embryonic Stem Cell Atlas from Pluripotency Evidence (ESCAPE) database, which incorporates published high content data on human and mouse ESCs [129]. Still, widespread adoption of in silico assays will depend on whether models can be developed to accept different datasets, whether they can be standardized, and whether they can be made widely accessible [17, 112]. If such a time comes, teratoma studies will likely remain important, but more as a tumorigenicity assay that can improve efforts in developing safe stem cell-based therapies [112].

Epigenetic Analysis

While the transcriptome has been the focus of many PSC characterization studies, there is also much interest in the epigenome. Global epigenetic analyses indicate that changes in histone modifications and DNA methylation are integral to PSC differentiation and consequently, to the reversion of differentiated cells into a stem-like state. For example, Hawkins et al. [130] found that the repressive H3K9 and H3K27 trimethylation of pluripotency and developmental genes increased in differentiated cells compared to ESCs. On the other hand, different groups have collectively identified thousands of differentially methylated DNA regions distinguishing between fibroblasts and iPSCs [131, 132]. On a gene level, it has been reported that the OCT4 and NANOG promoters exhibit increased DNA methylation, lose the permissive H3K4me3 and H3K9ac histone modifications, and gain the repressive H3K9me2, H3K9me3 and H3K27me3 marks when ESCs are differentiated into hepatocytes [133].

Given the apparent importance of epigenetics in PSC biology, epigenetics have been used to compare ESCs and iPSCs, similar to what has been done with gene expression studies. Guenther et al. [118] demonstrated that H3K4 and H3K7 trimethylation is similar between ESCs and iPSCs. On the other hand, despite having similar chromosome-wide methylation patterns, iPSCs seem to have increased DNA methylation compared to ESCs, with differential DNA methylation concentrated close to centromeres and telomeres [131, 134]. This has been suggested to be the result of incomplete reprogramming, but it could also be due to the genetic variability between lines since there has been a report that isogenic ESCs and iPSCs are not significantly different [119]. All in all, different groups have confirmed that there are some differences in DNA methylation between ESCs and iPSCs as well as between iPSC lines [7, 132, 133], although the broad relevance of these differences are still unclear. Still, with methylation clearly impacting specific genes that influence pluripotency, it is likely that epigenetic characterization of PSCs will become increasingly important in the future.

There are many methods for targeted or global analysis of the epigenome (Table 3). Histone modifications are currently being identified through chromatin immunoprecipitation (ChIP), where antibodies pull down specific histone modifications along with the associated DNA sequences [135]. In the traditional ChIP, sequences-of-interest are specifically interrogated by PCR [133], but ChIP-Sequencing employs next generation sequencing technologies to enable a global survey of the precipitated DNA [118, 130]. DNA methylation can similarly be identified through Methylation DNA Immunoprecipitation (MeDIP)-PCR or MeDIP-sequencing, where antibodies against the 5-methylcytosine pull down single-stranded methylated DNA [136]. However, such immunoprecipitation methods are limited by the performance of the antibodies. More common approaches to distinguishing DNA methylation include the use of methylation-sensitive restriction enzymes, which is inevitably limited by the distribution of restriction sites, and bisulfite conversion, which is not as limited but also tends to be more costly [137]. In all three approaches, specific sequences-of-interest are subsequently interrogated through Q-PCR and sequencing, while larger scale analyses are carried out via microarray or next generation sequencing.

Table 3 Methods for Epigenetic Analysis

Large scale variations on bisulfite sequencing include reduced representation bisulfite sequencing (RRBS), which initially uses a methylation-insensitive restriction enzyme to preferentially sequence CpG-rich regions [7, 138], and bisulfite padlock probes (BSPP), in which loci-of-interest are hybridized with user-selected padlock probes, creating DNA circles that can be amplified and analyzed using massively parallel sequencing [131, 139]. One large scale restriction enzyme-based analysis is called comprehensive high-throughput array-based relative methylation (CHARM) analysis and involves bisulfite treatment, digestion with McrBC and hybridization to a microarray [132, 140]. Data collected using such high throughput methods will be extremely useful for the development of predictive epigenetic assays, which are being touted as the next frontier in stem cell biology after predictive gene expression analysis [128]. So far, Bock et al. [7] have designed their bioinformatic scorecard to identify samples whose DNA methylation patterns fall outside the range of reference PSC lines, similar to how it identifies deviations in gene expression patterns.

Assessing Cell Quality and Identity

Ensuring Genomic Stability

The previously described methods are important for testing a cell’s designation as a PSC, but methods like karyotyping or other genetic analyses are also important for ensuring the quality of those PSCs. Indeed, there have been reports of hESC lines stably maintaining normal karyotype for more than 100 passages in culture [141, 142], but PSCs may sometimes develop genetic abnormalities through their derivation, passaging and culture for extended periods [8, 2124]. For instance, Brimble et al. [26] observed that triploidies for chromosome 12, 17, 1, 2, 8, and 14 frequently occurred in as few as 32 passages after starting to passage via single cell disaggregation. Because PSCs can accumulate deleterious karyotypic abnormalities, it is recommended that PSC lines be karyotyped regularly.

There are many methods that can be used for chromosomal analyses like Giemsa-banding (G-banding), spectral karyotyping (SKY) and fluorescence in situ hybridization (FISH). In G-banding, chromosomes are stained with the Giemsa dye, and resulting banding patterns are compared to idealized diagrams to identify chromosomal aberrations. Banding pattern analysis is done with at least ISCN 400 band level for ≥8 metaphases and chromosomes are typically counted for ≥20 metaphase spreads [17]. Despite the fact that G-banding of late metaphase chromosomes resolves 7–10 Mb compared to 1–2 Mb with SKY, G-banding is currently considered as the standard method for karyotyping [63, 143]. Still, studies have shown that single nucleotide variants and small copy-number variations can also occur in PSCs, suggesting that a truly comprehensive cell characterization will require higher resolution analyses with comparative genome hybridization (CGH) microarrays, single nucleotide polymorphism (SNP) arrays or the more recently newly adopted whole genome sequencing method [8, 20, 21, 23, 144, 145].

Ascertaining Cell Identity and Histocompatibility

DNA Fingerprinting

DNA fingerprinting is concerned with ascertaining the cell line’s individual identity—a task that is grossly underappreciated considering the high frequency of cell switching or cross-contamination throughout the history of cell culture, the scale of research funding and hours wasted as a result of such events, as well as the adverse impact of these events on scientific progress [146148]. The consequences of switching or cross-contamination in a clinical setting are particularly dire as using the wrong cell line for transplantation can lead to immune rejection by the patient. Thus, as PSC banking becomes more sophisticated and as stem cell therapies are developed, DNA fingerprinting of PSCs will likely be performed more often.

DNA fingerprinting involves the analysis of highly polymorphic DNA sequences, resulting in a signature that can be used to distinguish its donor [149]. The current prevailing method involves PCR amplifying highly polymorphic short tandem repeat (STR) loci and analyzing the PCR products at high resolution through capillary gel electrophoresis, with STR polymorphisms producing PCR products of varying lengths [150]. The Federal Bureau of Investigation (FBI) Laboratory has selected 13 independently inherited core STR loci (CSF1PO, FGA,TH01, TPOX, VWA, D3S1358, D5S818, D7S820, D8S1179,D13S317, D16S539, D18S51, and D21S11) for use in CODIS, the national DNA databank [151]. Companies have taken cues from the FBI and their foreign counterparts, providing DNA fingerprinting kits that analyze those 13 loci or more. Analyzing as few as 8 loci determines PSC line identity at a random match rate of 7.4 × 10−10 [150, 152], while analyzing 16 loci results in a significantly reduced random match rate of 7.2 × 10−19 [12, 106, 150]. With these odds, DNA fingerprinting ensures that the ESC or iPSC line in question is the exact one that is needed for the intended use.

HLA Analysis

Human Leukocyte Antigen (HLA) proteins are members of the immunoglobulin supergene family that enable the immune system to distinguish “self” and “non-self” cells. They determine the histocompatibility of donors and recipients during cell or organ transplantation, with HLA type mismatches correlating with the risk of transplant rejection [153155]. There are two classes of HLA antigens that are coded by multiple polymorphic loci found in two separate regions of the major histocompatibility complex (MHC) system: Class I with 7678 alleles and Class II with 2268 alleles listed in the International IMGT/HLA sequence database as of the 3.14.0 release in October 2013 [156].

Given the highly polymorphic nature of the HLA loci, HLA typing can be used as an aid for determining PSC identity [12, 17]. More importantly, HLA typing is done in anticipation of the development of PSC-derived cell grafting therapies and ultimately aims to minimize the risk of allograft rejections [152]. In relation to this, several groups have proposed the establishment of HLA haplotype banks carrying clinical grade PSCs that would serve the needs of different populations [157160]. Taylor et al. [159] determined that 150 common haplotypes can account for 93 % of the UK population, while Okita et al. [161] estimated that 140 haplotypes would match with 90 % of the Japanese population. In the same study, Okita et al. [161] performed HLA typing to show the utility of a newly generated iPSC line that shared haplotypes with 20 % of the Japanese population.

Typing of different loci can be done at high resolution or at low resolution [162]. Low resolution typing is generally faster and cheaper since it only involves defining serological types or the equivalent allele families; it makes use of serological assays or limited molecular characterization. High resolution typing uses an extended set of molecular assays to characterize the alleles in more detail. The individual molecular assays used in either level of typing include sequence-based typing (SBT) [163], amplification of polymorphic regions using sequence-specific primers (SSP) [163] and probing polymorphic PCR products with sequence-specific oligonucletides (SSO) [164, 165]. Many of these assays are commercially available. There are no specific guidelines on the resolution and loci that should be used for PSCs but groups conceptualizing the PSC HLA haplotype banks acknowledge the importance of typing HLA-A, −B and DR [157160]. A number of papers involving PSC characterization have already been published with HLA typing on at least these loci [12, 26, 106, 152].

Detecting Microbial Contamination

Microbial contaminants can be introduced to cell lines through contact or through the biological reagents used in cell culture like feeder cells, serum, and enzymes. In some cases, microbes are easily detectable as they cause the media to become turbid and acidic. However, some microbes like the mycoplasma species can escape filtration and remain undetectable while they influence the behavior of the cells in culture [166]. Microbial testing is thus done to ensure that the cells are free of microbial agents that can alter their behavior, as well as to safeguard the health of the researchers who handle PSCs and patients who use stem-cell derived treatments [17].

For academic purposes, it may be sufficient to periodically detect fungi and bacteria, more frequently testing specifically for mycoplasma because of the high incidence of mycoplasma contaminations in cell cultures [12, 121, 167, 168]. For cell therapies or cell banking, PSCs must additionally be tested for other dangerous viruses like hepatitis C, human immunodeficiency virus (HIV), human papillomavirus (HPV), herpes simplex virus (HSV), and human herpes viruses (HHVs) as determined by local policy and prevalence. There are many contract laboratories offering comprehensive microbial testing services, but it is important to select those that have the proper qualifications and offer the services that are appropriate for the intended purpose of the PSCs.

When performing mycoplasma testing internally, the most sensitive method for detecting culturable strains is to grow them in selective media over 28 days [17, 169]. A less sensitive but faster technique which can be used with non-culturable strains involves staining the DNA of inoculated Vero cells and visualizing the signal with high magnification UV-epifluorescent microscopy [17, 167]. Other methods include PCR amplification of mycoplasma DNA or the use of commercially available kits to detect mycoplasma-specific enzymes [17, 168, 169]. These methods can be used with a wide variety of strains, are even faster than DNA staining and are highly sensitive. However, the PCR method is prone to false positives and false negatives due to artefacts, PCR inhibition and small sample volume [170].

Conclusion

There is a very wide variety of techniques that can be used for PSC characterization from immunostaining to Q-RT-PCR to karyotyping. Techniques can be quantitative or qualitative, molecular or cellular, and they present their own sets of advantages and challenges. The appropriate analyses for any particular situation are determined by the investigator’s goal. For example, the identification of emerging iPSCs during reprogramming tends to make use of quick and qualitative analyses whereas the characterization of established clones can involve either quantitative or qualitative analyses and frequently involves the termination of cultures. On a higher level, the analysis of self-renewal markers, transcriptomes, epigenomes and differentiation capacity is useful for ascertaining pluripotency, whereas karyotyping, DNA fingerprinting, HLA typing, and microbial testing are quality control tests that ensure the identity and safety of the PSCs. In general, the cells require more extensive and more frequent characterizations using progressively defined methods as they are used for applications with increasing impact.

Stem cell banks and registries currently have the most defined and comprehensive criteria for the characterization of PSCs and their guidelines serve as useful references [17, 25]. That said, it would be prudent to additionally keep abreast of the literature since characterization practices are constantly evolving in response to the rapid technological advances. As noted earlier, there is a trend towards using next generation sequencing for transcriptome and epigenome profiling. This trend may eventually spread into HLA typing, DNA fingerprinting, and genetic analysis, providing a comprehensive means to detect dangerous mutations or other genetic aberrations that would undermine the quality and safety of the cells. There is also a burgeoning trend in supplanting the cumbersome teratoma assay with faster molecular and in silico analyses. This trend may or may not continue depending on the fulfillment of the conditions discussed earlier, but it highlights ongoing efforts to simplify the characterization workflow.

Despite the variety in the techniques available as well as the improvements that are currently being made, existing methods still leave much to be desired. Most of the tools and tests described provide a snapshot of the cells’ status at one point in time and do not permit real-time monitoring. As such, there is always a risk that cells will later change and no longer be suitable for the intended application, resulting in a waste of resources until the cells are again characterized. Moreover, most protocols require multiple steps and last anywhere from a few hours to months, presenting challenges to automation and scale-up. Indeed, molecular tests may lend themselves better to the analysis of many markers at one time, but most molecular and cellular methods described here are still not amenable to the analysis of many cell lines simultaneously.

Together, these observations indicate that PSC characterization is still a developing field. While many methodologies have been developed to assess cell pluripotency and quality, there is still a need to streamline and standardize characterization practices. Furthermore, there are still significant challenges preventing the automation and scale-up of the characterization workflow and transitively, the PSC workflow. Addressing these challenges will be critical for PSCs to reach their full potential in disease research, drug discovery and therapy.