Keywords

Induced Pluripotency and the Human Genetic Model Organism In Vitro

From the inner cell mass (ICM ) of fertilized embryos, James Thomson first derived human embryonic stem cells (ESCs) [1]. These novel cells have two key properties: first, they are capable of indefinite cell division in culture (self-renewal), and second, as do their biological counterparts, they maintain the capacity to differentiate into all cells and tissues of the embryo and adult (pluripotency). The unchallenged advantage of human ESCs over other experimental cell systems has been this capacity for differentiation, either in vivo via teratoma [2] or in vitro via adherent or three-dimensional (3D) cell culture [3, 4]. An application in disease research and regenerative medicine for ESCs was immediately apparent; however, the embryonic source of material has remained an ethical controversy [5].

One decade has now passed since Kazutoshi Takahashi and Shinya Yamanaka first demonstrated that mouse somatic cells could be reverted through the expression of four transcription factors, Oct3/4, Sox2, Klf4, and c-Myc, to a primitive embryonic-like stem cell state [6]. Derivation of induced pluripotent stem cells (iPSCs) from human somatic cells (Fig. 1) followed shortly thereafter [7], marking a partial ethical resolution and profound technical contribution [8]. Compared to ESCs, iPSCs present an additional benefit for disease modeling and putative therapies: derived from a consenting individual, they represent personalized stem cells. Moreover, in contrast to ESCs from terminated embryos, iPSCs may be linked to the health and well-being of a living person, complemented by a recorded lifetime medical history. Thus, combined with in vitro differentiation to cells and tissues (Fig. 1), human iPSCs present a proxy by which individualized genetic variation may be accessed to understand the relevance to personal health [9].

Fig. 1
figure 1

The source, characteristics, and applications of human pluripotent stem cells. Genome engineering is applied to generate or validate human models of development and disease

Pluripotent stem cells (PSCs )—whether ESCs derived from the human embryo or iPSCs derived through reprogramming—display key properties of a tractable genetic system: a short generation time (~15 h), a high proportion of cells in S-phase [10], indefinite proliferation, ease of culture, a propensity for DNA transduction, and selection by antibiotics or genetic complementation followed by clonal isolation and expansion. With more recent advances such as Rho-kinase inhibition for improved single-cell survival [11], and feeder-free cell culture methods using defined matrices and media [12, 13], human PSC handling is more akin to murine PSC counterparts by means of single cell passage, and high-throughput 96-well clonal maintenance and expansion [14].

As is explored in the following sections, the marriage of iPSC technology with a new generation of genetic engineering tools has enabled the precise transfer of reporter or therapeutic transgenes, gene disruption, or even gene correction (Fig. 1). It is remarkable to reflect on the speed and relative ease at which both iPSC and genome engineering technologies have been adapted and merged for in vitro disease modeling and drug screening.

Rise of the Genome Editing Machines

Genetic manipulation by gene targeting is a mainstay of functional genomics. Those fundamental principles of gene targeting first outlined by Mario Cappechi using positive-negative selection in mouse ESCs [15, 16] are duly applicable to human PSCs. However, even following these guidelines, the first gene-targeting experiments in human ESCs [17] indicated that gene-targeting rates would be typically lower than observed in the mouse, leaving an obvious need for improvement.

The formation of double-strand breaks (DSBs ) in genomic DNA occurs naturally during DNA replication or in response to stresses such as ionizing radiation, and are vital to resolve recombination during meiosis and the production of immune system diversity [18]. DSB repair (DSBR) by end-resection and nonhomologous end-joining (NHEJ) can be inherently mutagenic, whereas homology-directed repair (HDR ) can faithfully restore DNA sequence in the presence of a template donor DNA (such as the sister chromatid). It was therefore hypothesized that intentional formation of DSBs at target loci could enhance gene-targeting frequencies via a custom donor DNA [19, 20]. The demonstration that the FokI nuclease domain is separable from DNA-binding domains [21] suggested a method by which nucleases could be engineered with novel specificity.

Zinc-finger nucleases (ZFNs ; Fig. 2a, top) built a foundation for recombinant endonuclease applications to enhance gene targeting through targeted DSBR, and remain a powerful tool in genome engineering even today [22]. Composed of a DNA-binding domain-encoding specificity and a FokI nuclease domain, ZFNs function as paired proteins that position and dimerize FokI monomers to cleave at a target locus [23]. The binding of zinc fingers to DNA triplets is modular [24]; however, finger–triplet interaction properties have been shown to be highly context and neighbor dependent, such that engineering custom ZFNs remains notoriously difficult. This problem has been partially addressed using validated libraries of submodules composed of two or three fingers [25]; however, screening for functional ZFNs is a resource-heavy endeavor.

Fig. 2
figure 2

Common nuclease systems used to stimulate double-strand break repair (DSBR) at target genomic sites. (a) Zinc-finger nucleases (ZFN) and TAL effector nucleases (TALEN) are composed of dimeric nuclease domains addressed by engineered DNA-binding domains. The Cas9 nuclease is addressed by a synthetic guide RNA molecule. Endonuclease components are green; targeting components are blue. (b) DSBs recruit endogenous repair machinery allowing genetic modification by nonhomologous end-joining (NHEJ) or template-mediated homology-directed repair (HDR) pathways

TAL effector nucleases (TALENs ; Fig. 2a, middle) broke the barrier between technical novelty and practical application [26], priming research laboratories through in-house nuclease design and production [27]. Plant pathogenic Xanthomonas spp. secrete TAL Effector (TALE) proteins, which activate host gene expression, resulting in a metabolic advantage to the invader [28]. The TALEs represent a unique class of proteins that bind DNA in a 1:1 modality [29, 30], making engineered design of TALENs more straightforward than that of ZFNs. The nature of this protein–DNA interaction is mediated through polymorphic protein repeats that display little degeneracy and no obvious neighbor effects. In a large-scale in vivo screen in zebrafish, TALENs were found to be more mutagenic than ZFNs [31]. Presumably, the increased tolerance of the spacer region provides a larger substrate for exonuclease activity, resulting in broad deletions compared to the conservative resection observed using tightly juxtaposed ZFNs.

CRISPR/Cas9 (Fig. 2a, bottom) has stolen the proverbial ‘show,’ capturing the attention of academia and public alike as it rapidly transcended from discovery as the hunter–killer of a potent anti-phage adaptive bacterial immune system [32], to experimental modulation and design [33] for genome engineering purposes [34]. The Cas9 protein , a general endonuclease that produces DSBs through HNH and RuvC nuclease domains, forms a ribonucleoprotein (RNP) complex with bacterially processed short CRISPR RNAs (crRNA) and trans-acting crRNA (tracrRNA) that pair with foreign genomic DNA targets to address and activate nuclease activity [35]. Biochemical characterization of the key CRISPR components, by Jennifer Doudna and Emmanuelle Charpentier [33], indicated that programmable cleavage could be achieved through the custom design of a hybrid crRNA–tracrRNA single guide RNA molecule (sgRNA ), which could be simply co-expressed or transfected as RNA along with the Streptococcus pyogenes Cas9 (SpCas9) protein. Thus, the CRISPR/Cas9 system could theoretically be programmed to cleave any 20-nt sequence upstream of a 5′-NGG-3′ protospacer adjacent motif (PAM ). This report was immediately followed by back-to-back proof-of principle experiments describing genome engineering in human cells [36, 37]. During the past 3 years, CRISPR/Cas9 technology has enabled gene knockouts across previously inaccessible genetic model organisms [38] and high-throughput genomic screens [39, 40], highlighting the simplicity of design and ease of application.

SpCas9 is by far the most commonly used CRISPR system. However, active variants from other bacterial species such as Staphylococcus aureus (SaCas9, 5′-NNGRRN-3′ PAM) [41] and Neisseria meningitidis (NmCas9, 5′-NNNNGATT-3′ PAM) [42] have been applied for genome editing with variable success. As proteins of bacterial origin, rational design of Cas9 in prokaryotes is conventional, leading to new variants of SaCas9 with modified PAM specificity, and therefore broader targeting ranges [43]. Mining prokaryotic genome databases through homology, cloning, and functional validation has yielded family members with new properties. Differing from SpCas9, Francisella novicida Cpf1 (FnCpf1) requires only a single guide RNA, and recognizes a 5′-TTN-3′-PAM, therefore accessing completely different sequence space [44]. The rich diversity of CRISPR systems in bacteria suggests that additional nucleases with distinct properties remain to be discovered.

Biochemical subtleties of DNA recognition and cleavage aside, engineered nucleases enhance random mutagenesis and gene targeting by eliciting endogenous DSBR pathways (Fig. 2b). As the NHEJ mutation spectrum is essentially random, it provides allelic depth for clonal cell panels, yet can complicate high throughput screening [45]. Under special circumstances of genomic sequence context and DSB position, DSBR can be driven by subtle regions of microhomology to produce indels in a predictable manner [46]. Bi-allelic DSBs can allow for homozygous targeting by HDR, an event achieved rarely with classic gene targeting [47]. Combinatorial events, such as HDR-mediated targeting of one allele, and NHEJ knockout of the other, can be a boon or a bane. Because of the promiscuity of the Cas9 protein, combinatorial approaches using multiplexed sgRNAs have led to multiple mutations in a mouse stem cell or embryo [48, 49], accelerating the analysis of multiple genetic interactions and emphasizing the power of the CRISPR/Cas9 nuclease system for functional genomics studies.

Adding Function to iPSCs Through Transgenesis at Safe-Harbor Loci

Gene targeting may be used to eliminate or alter endogenous genes, or introduce new functions. Transgenesis with viral or transposon systems has the advantage of being robust and rapid [14], yet as a trade-off does not directly control for integration site and therefore requires the use of populations or screening multiple clones to discern suitable or comparable gene expression levels. Nucleases permit transgenes to be introduced into defined loci and therefore minimize clonal variation by moderating position effects [50]. HDR-targeted transgenesis includes applications such as cDNA rescue of mutant genes and fluorescent knock-in alleles to report endogenous gene expression or simply label cells constitutively [51].

Perhaps the most well known “safe-harbor” locus is AAVS1, a hotspot for adeno-associated virus insertion located within intron 1 of the PPP1R12C gene [52]. AAVS1 is akin to the mouse ROSA26 locus [53], providing a reliable transgene expression with no known phenotype resulting from homozygous transgene insertions [47]. Targeting and expression of cDNAs from the AAVS1 locus has rescued monogenic diseases such as X-linked chronic granulomatous [54] and α-thalassemia [55]. Conversely, overexpression of dominant negative ion channel genes KCNQ1 and KCNH2 from the AAVS1 locus can recapitulate Long-QT syndrome for the development of an isogenic in vitro drug-screening platform [56].

Other safe harbors, such as the X-linked hypoxanthine phosphoribosyltransferase 1 (HPRT1) locus, is permissive for constitutive expression [57], yet disruption causes HPRT1-deficiency spectrum diseases ranging from gout to Lesch–Nyhan syndrome . The human l-gulono-γ-lactone oxidase (GULOP) locus is a nonfunctional pseudogene in humans [58, 59] presumed to avoid phenotypic effects. Yet, transgene expression in pluripotent and differentiated lineages is less well described. Beyond gene disruption, the local effects by potent transgenic promoters on endogenous gene expression must also be considered [60]. One such example is the citrate lyase beta-like (CLYBL ) locus that lies in a gene-deficient region of human chromosome 13 and claims to confer less severe effects on local gene expression [61]. Finally, in a mouse model of hemophilia A and B, expression of human factors VIII and IX from the endogenous albumin locus achieved long-term expression of transgenes at therapeutic levels [62]. Therefore, context-dependent safe harbors may be found in loci that are active in the target-differentiated cell type yet repressed in others, and not associated with a known haploinsufficiency phenotype.

Achieving Seamless Genome Engineering for Accurate Disease Models

In the interest of generating faithful models of human genetic disease, engineered changes that recapitulate single-nucleotide variations (SNVs ) would be preferred over crude knockouts. The de facto test for evaluating the role of candidate mutations in disease is to repair the mutation in patient iPSCs, or to recapitulate it in otherwise normal iPSCs [63]; true correction or recreation of patient-specific mutations would require approaches that are free of residual foreign genetic elements.

Classic gene targeting [16] deposits antibiotic-positive selection cassettes to enrich for HDR-mediated events (Fig. 3a). Retention of such elements is invaluable for producing knockouts, and reconcilable with the integration of reporters [47] or even therapeutic transgenes [64]. However, in the interest of modifying small regions of DNA—or in the extreme case, single nucleotides—selection cassettes and other elements can disrupt the native locus and may even cause unpredictable pleiotropic effects [65]. Removal of antibiotic selection cassettes is typically performed through site-specific recombinase-mediated excision [66]. In this approach, the recognition sites for Cre (loxP), Flp (FRT) recombinases [67] flank the selection cassette, which is introduced juxtaposed to the mutation (Fig. 3b, left). Following the selection of targeted clones, the cassette is excised by transient recombinase expression, yet a nontrivial single recombinase site (34 bp in the case of loxP) remains. Although residual elements may be positioned in “neutral” genetic regions such as introns, unexpected effects on gene expression and the predicted phenotype remain probable [64].

Fig. 3
figure 3

Derivation of human induced pluripotent stem cells (iPSCs) engineered by HDR. (a) Classic gene targeting events are enriched by positive selection. (b) Excision of antibiotic selection markers by Cre recombinase (left) leaves behind loxP sites, while PB transposase (right) removes cassettes seamlessly from endogenous or engineered TTAA tetranucleotides to deposit point mutations. (c) Mutation deposition by short single-strand oligonucleotides (ssODNs) obviates the need for excision, yet requires intensive screening

As an alternative to recombinases, the piggyBac (PB) transposon undergoes high-fidelity seamless excision from mouse and human iPSCs [68], and has been developed as an excisable positive/negative selection cassette for genome modification [69]. One caveat is that PB elements excise only from TTAA tetranucleotides, such that a TTAA must be present, or silently engineered near the mutation (Fig. 3b, right). PB provides more flexibility and a subtler footprint than recombinases, yet excision frequencies are locus dependent, and reintegration of the transposon may occur stochastically, whereas excision-prone transposase variants [70] may display higher rates of mutagenesis.

Diverging from classic targeting vector-based genome modification relying on antibiotic enrichment, short single-strand oligonucleotide (ssODN ) templates have been employed in combination with ZFNs [71, 72], TALENs [73], and CRISPR/Cas9 [48]. In this approach, ssODNs typically more than 100 nt in length carry sufficient homology to deposit point mutations into nuclease-cleaved loci in a single step without codeposition of foreign sequences (Fig. 3c), providing a clear advantage over recombinase-based methods [74]. It should be noted that ssODN -modified loci that retain the nuclease target site are potentially subject to recleavage and mutagenic NHEJ repair. Silent mutations that prevent nuclease recognition and recleavage detract from the subtlety of the method, but may be necessary to avoid additional screening. Moreover, aberrant ssODN insertions at on- or off-target sites [75] or random mutations on-target [76] may occur under normal conditions or as a reflection of oligo quality and are extremely difficult to predict and detect. Although ssODN-mediated targeting events are frequent in cell lines, the low frequency of correct targeting in iPSCs (>1 %) [48, 72, 73], compounded with possible mutagenic events, demands robust and sophisticated selection.

One advanced approach to detect correct gene editing employs serial population screening, a type of sib-selection for human iPSCs where mutation-containing populations are monitored by droplet digital polymerase chain reaction (ddPCR) and enriched using serial sub-fractionation [77]. In developing this technique, the authors successfully deposited mutations into five disease-associated genes (PHOX2B, PKP2, RBM20, PRKAG2, and BAG3). Population screening by ddPCR is robust but not trivial, requiring custom TaqMan assays, sophisticated instrumentation, and additional iPSC passages. A streamlined approach to derive gene-corrected iPSCs that combined CRISPR/Cas9 gene targeting with the somatic cell reprogramming process [76], reported gene knock-in efficiencies as high as 5 %, and ssODN-mediated gene correction rates as high as 8 %. Although useful during de novo iPSC derivation, this approach is obviously not applicable to previously established iPSC lines. Finally, frequencies of desirable targeting using ssODNs may still see improvements through lessons learned from the biochemistry of DNA opening and Cas9 cleavage. As the sgRNA nontarget (unbound) strand is released first, ssODNs with positioning and complementarity to the nontarget DNA strand are more effective at inducing HDR, up to 60 % in HEK293T cells [78]. Applications of these findings in iPSCs hold promise.

Avoiding Unwanted Outcomes: Off-Target Cleavage and Mosaicism

Nuclease cleavage of the genome is by no means infallible, and undesirable DSBR events may occur through surreptitious cleavage at sites other than the chosen target region. In these cases, DSBs repaired preferentially through NHEJ may result in subtle indels (Fig. 2b) with no capacity for counterselection. Such “off-target” effects may or may not have phenotypic consequences.

Unbiased off-target detection using whole-genome sequencing (WGS ) can evaluate genome-engineered iPSC clones [79, 80], yet the depth of data and the threshold for detecting rare mutations argue against the practicality of the approach. Exome sequencing simplifies analysis, yet provides data for only a small portion of the genome. Targeted screening methods based on degenerate sequence similarity between the sgRNA and nontarget regions of the host genome provide an off-target candidate list that may be verified using conventional NHEJ detection methods such as the T7E1 hybrid-cleavage assay [81], Sanger sequencing with decomposition [82], or deep sequencing of amplified products [46]. However, these biased approaches are time consuming and limited by the quality of prediction algorithms for candidate off-target sites.

Off-target screens relying on the functional properties of nucleases have the potential to focus screening efforts without user bias. Chromatin immunoprecipitation using Cas9 antibodies [83, 84] can detect sites of Cas9 interaction with the genome but are not related directly to DNA-cleavage events. Linear amplification-mediated high-throughput, genome-wide, translocation sequencing (LAM-PCR HTGTS) is a cumulative method that detects off-target cleavage by virtue of genomic translocations formed between nuclease-generated or even endogenous DSBs [85], indicating a two-break-minimum detection limit. On the other hand, single NHEJ events have been shown to capture foreign DNA elements such as integration defective lentiviral vectors (IDLV ) [86, 87], which can then act as tags for targeted sequencing efforts. GUIDE-seq applies this same principle, yet uses oligonucleotide tags compatible with next-generation sequencing to streamline sample processing and data integration [88]. BLESS (direct in situ breaks labeling, enrichment on streptavidin, and next-generation sequencing) attempts to capture a snapshot of the fragmented genome within cells, but requires complex fixation and manipulation steps [89]. DiGenome sequencing is an in vitro approach to genomic DSB detection using WGS to detect indels as DNA fragment ends [90]. Differences in detection profiles for these methods may reflect the methodology and must ultimately be verified experimentally.

Refining the detection of off-target cleavage is a crucial endeavor, yet does not directly prevent the causative insult to the genome. Therefore, it would be prudent to develop engineering methods that minimize off-target cleavage events or increase on-target cleavage specificity. One straightforward approach could be to temporally limit the expression of Cas9 and sgRNAs. However, simply reducing the amount of expression vector DNA transfected does not reduce the relative rates of off-target cleavage [91]. In contrast to plasmids, which can express over periods of 3 to 4 days or even integrate randomly into the genome, delivery as in vitro transcribed (IVT ) mRNA limits the nuclease expression window to 1 to 2 days and yet is still effective for on-target cleavage. An additional step toward restricted nuclease activity is to produce RNP particles through the in vitro combination of commercially available recombinant SpCas9 protein and IVT or synthetic sgRNAs, followed by delivery directly into iPSCs by electroporation or chemical transfection [92, 93]. An in-depth analysis of the off-target outcomes from such procedural changes is pending.

It is clear, however, that limiting nuclease activity temporally has the potential to reduce mosaicism under conditions normally presumed to produce clonal iPSCs. Mosaicism can arise from unique DNA cleavage and DSBR events in the daughter cells of nuclease-transfected iPSCs, resulting in two or more divergent populations in a drug-selected colony [94]. Mosaicism confounds the detection of off-target effects, which may be present below the threshold of detection in the total iPSC population. Interestingly, sib-selection procedures involving rounds of serial subcloning from the starting population [77] impose a temporal separation of nuclease treatment and physical cloning events to derive truly clonal iPSC populations.

Engineering native nuclease behavior to reduce or prevent off-target cleavage was initially proposed for recombinant FokI nuclease domains [95]. By inactivating the catalytic domain of one monomer in a ZFN dimer, ZFNickases were shown to have lower levels of off-target mutagenesis, albeit with an overall reduction in on-target HDR activity [96]. Similarly, a derivative of SpCas9 in which the RuvC nuclease domain has been inactivated by mutagenesis (SpCas9n, D10A) acts as a DNA nickase [37]. This hobbled enzyme has been used in juxtaposed pairs to produce staggered nicks, and touted as having lower off-target cleavage activity than their full active counterparts because rogue binding of a SpCas9n monomer would produce single-strand nicks rather than DSB [97]. Yet, it is important to remember that nicked DNA intermediates can still be processed by NHEJ mechanisms, resulting in an off-target indel [98], suggesting that alternative approaches still require consideration.

With solution of the DNA/RNA hybrid-bound SpCas9 protein structure [99, 100] came the possibility of rational SpCas9 engineering. Modeling revealed a positively charged groove between the nuclease and PAM interacting domains, proposed to be involved in stabilizing the nontarget DNA strand. Mutagenesis of K848A, K1003A, and R1060A residues within the groove retained approximately 60 % on-target activity, while increasing sensitivity to sgRNA mismatches, most notably outside the 7–12 nt “seed” region [44]. In another approach, diminished bonding energy through quadruple mutagenesis of DNA-contacting N497A, R661A, Q695A, and Q926A residues produced a high-fidelity variant of SpCas9 (SpCas9-HF1) with undetectable off-target activity [101]. On-target activity was reported to be 70 % of the native SpCas9, a modest compromise for higher specificity.

Modulation of the RNA component of the Cas9 RNP complex has also been shown to positively affect on- and off-target cleavage ratios. It was suggested that truncated sgRNAs (truRNAs) may gain cleavage specificity as a trade-off for activity [102], yet 16-nt versus the standard 20-nt sgRNA molecules has not become a norm for CRISPR experiments. Optimal sgRNA design has been shown to affect on-target cleavage activity [103]. More recently, revision of the rule set governing sgRNA design by Doench and colleagues suggests a predictive scoring system for increasing on-target activity while avoiding off-target cleavage, as demonstrated using a genome-wide knockout screen [104]. It remains to be seen how the community at large will adopt these bioinformatic rule sets, and if they hold true in various experimental situations.

Selection of Isogenic Clones and Technical Controls

Reprogramming technology captures the genome of the patient as a pluripotent cell resource, enabling in vitro modeling of disease that, by necessity, separates the cell from the patient. Differing from animal models, phenotyping results are therefore limited by the sophistication of in vitro cellular differentiation [105] and assay evaluation criteria. To directly link genotypes to phenotypes, appropriate control cell lines are of utmost importance.

Control iPSC lines represent a selected or engineered group of iPSCs that are genetically matched for the purpose of excluding erroneous variation and increasing the accuracy of disease studies [106]. iPSCs from unrelated normal individuals have been used to produce target cell disease controls [107], taking into account that their genetic backgrounds may vary by degrees (Fig. 4a). As such, the number of unrelated iPSC clones that must be analyzed in parallel to define a genotype–phenotype correlation increases in relation to the statistical power required (Fig. 4b). Within practical limits, the required number of control iPSC lines depends mainly on the strength and correlation of the in vitro phenotype with clinical presentation and the complex influence of genetic background variation on phenotypes observed within the patient population [108]. One standard for reducing genetic variation has been to compare disease iPSCs to normal iPSCs from unaffected siblings who share much of their genetic background with the affected donor by blood relationship (Fig. 4a). Intriguingly, as ESCs are often derived from pools of discarded in vitro fertilization (IVF) material [1], there may in fact be a higher rate of sibling relationship among publically available ESCs than iPSC lines. However, potential racial bias and an association with a higher incidence of infertility-related alleles, along with a reported marked difference in differentiation capacity between ESC lines [109], may further offset this proposed benefit. On the other hand, the documented medical background of the donor combined with a deep genetic analysis may help predict the severity of phenotypic deviation between experimental and control iPSC lines.

Fig. 4
figure 4

Appropriate sources of isogenic control iPSC clones . (a) iPSCs from unaffected siblings or normal donors are typically used as controls. (b) Multiple iPSCs may be used to reduce noise from clonal variation. (c) True isogenic controls may be produced through genome engineering. (d) To preclude phenotypic effects from off-target cleavage, different sgRNAs may be used to produce the same genomic modification

Subtle differences in the genomes and epigenomes between iPSC lines may influence in vitro phenotypes [110]. Concerns over the accumulation of mutations throughout the reprogramming process as a result of proliferative stress have been raised [111]. Conversely, more recent studies have shown that iPSC derivation is inherently stable at the genetic level [112], suggesting that the risk of genetic drift arises during extended in vitro culture and is therefore similar for both ESCs and iPSCs. However, the process of iPSC derivation itself is selective, such that preexisting somatic mutations in the patient’s donor tissue can be clonally amplified [113, 114]. It has also been proposed that reprogrammed cells might retain an epigenetic memory of their somatic source, which could influence differentiation capacity [115]. Interestingly, such epigenetic memory has been disputed by the observation that more significant variation in differentiation capacity occurs as a result of genetic background than somatic tissue source [116, 117]. Still, these uncertainties regarding inherent and acquired phenotypic variation strongly argue the case for isogenicity.

Fortunately, iPSCs themselves are inherently isogenic with their donor, and through the application of subtle nuclease-mediated genome-editing approaches described above, gene-corrected iPSCs can be derived directly from donor iPSCs (Fig. 4c) [63]. Similarly, well-characterized normal iPSCs will retain isogenicity if converted to diseased iPSCs using nuclease techniques. When patient-specific iPSCs cannot be procured, recreating mutations by genome editing provides a novel material for the study of genetic effects on disease progression and severity in a defined genetic background. Quality-controlled normal iPSCs could be accessed from one of many proposed stem cell “libraries,” which aim to generate clinical-grade and HLA haplotype-matched control iPSC lines for therapeutic applications [118]. However, it should be cautioned that in the conversion of normal iPSCs into diseased iPSCs, disease phenotypes might be masked by protective alleles. Candidate gene disruption in multiple ethnic backgrounds may therefore be necessary to exclude complex genetic effects [119].

Phenotypic variations between experimental iPSC lines and isogenic controls may have a technical origin. Off-target nuclease effects require labor-intensive screening to detect and might contribute to the observed phenotype. As an alternative to deep sequencing or comparing multiple gene-corrected clones from a single experiment (Fig. 4b, c), it is advised to instead make use of a second sgRNA with its own distinct off-target profile (Fig. 4d). In this way, it is possible to rule out common off-target events between separately derived clones as a direct influence on phenotype. Similarly, employing PB-mediated gene targeting and excision for precise editing [120], reintegration of the transposon may occur stochastically. Yet, these clones may still prove useful for validating phenotypes, because each clone should represent a novel reintegration event. Splinkerette PCR-based methods for mapping reintegrations [121] could help predict the influence on genomic integrity and possible phenotypic changes. Considering these sources of technical variation and their logical solutions, genome engineering (Fig. 4c) stands as the strictest method to maintain isogenicity within control iPSCs.

Conclusions

The combination of iPSC and nuclease technologies, particularly CRISPR/Cas9, has generated a true paradigm shift in modeling human genetics and disease. Although more accessible than ESCs, patient-specific iPSCs still require informed consent, and can prove to be morally and monetarily extravagant research materials. Applying genome-editing technologies, there is no longer a need to initiate disease modeling with the procurement of patient-specific iPSCs. Instead, candidate mutations or allelic series may be first engineered singly or in combinatorial fashion into genetically and phenotypically defined “reference” ESCs or iPSCs. Once available, the panel of iPSC “standards” may be used to refine in vitro physiological assays. Finally, as required, patient-specific iPSCs may be screened using the optimized assay system to interrogate candidate mutations and the effect of native genetic background. Future avenues of research will most certainly entail combined gene editing and reprogramming strategies, bringing to fruition both preclinical and clinical applications of stem cell technology to personalized medicine.