Keywords

1 Introduction

In the 1960s, George Streisinger and colleagues introduced the zebrafish as a new vertebrate model for the study of developmental genetics. Thanks to its small size, ease of breeding, and external fertilization producing large clutches of transparent embryos, the zebrafish appeared especially suited for morphological observation of developmental processes. Since then, the zebrafish has become a model of choice, not only for studying vertebrate development, but also for modeling human diseases and conducting molecule screening for drug discovery. Comparison of the zebrafish and human genomes indicates that 71.4 % of human protein-coding genes have at least one zebrafish orthologue [1]. Systemic large-scale forward genetic screens combined with the annotation of the zebrafish reference genome have led to the identification of a large variety of mutations affecting embryogenesis, physiology or behavior relevant to human health [2, 3]. More recently, an explosion of new tools and techniques, in particular the development of engineered endonucleases (EENs ), have open a new area for genome editing, allowing the direct manipulation of the zebrafish genome for targeted mutagenesis and transgenesis (Fig. 1). This chapter provides an overview of the different approaches used for mutagenesis and transgenesis in zebrafish, with an emphasis on the recent progress in targeted genetic manipulations, and their ‘translational’ applications to modeling selected brain disorders.

Fig. 1
figure 1

Timeline recapitulating the development of mutagenesis approaches in zebrafish. This historical timeline recapitulates the major advances in the development of mutagenesis approaches in zebrafish over the last 40 years. The first ENU-based mutagenesis screens conducted in the Driever and Nüsslein-Volhard labs have generated several hundreds of mutants that were reported in a dedicated issue of Development in 1996. As such, no author name was reported in the corresponding box

2 Mutagenesis in Zebrafish

2.1 Chemical Genetic Screens

Forward genetic screens allow the identification of genes implicated in a specific biological pathway or process by screening a population of animals in which genome modifications have been randomly induced. Individuals that display a phenotype of interest are identified as carriers of a modified/mutated allele, and subsequent mapping of the modification/mutation responsible for the phenotype identifies the gene involved. Thanks to its external fertilization, its large clutch size, its rapid development and the transparency of its embryos, the zebrafish proved ideal for large-scale forward genetic screens. While initial work used UV light and γ-ray irradiation to trigger chromosomal breaks [46], N-ethyl-N-nitrosourea (ENU) has rapidly become the standard choice for chemical mutagenesis [79]. ENU is a DNA alkylating agent that mostly triggers point mutations, has a high mutagenic efficiency, and can be directly applied to adult male zebrafish by adding it to the water, making it very easy to use (Fig. 2). The first large-scale ENU screens performed in the Driever lab in Boston and the Nüsslein-Volhard lab in Tubingen have generated several hundreds of mutants with developmental phenotypes that were reported in a dedicated issue of Development in 1996 [10]. Other ENU screens have subsequently identified more genes involved in development [1116], behavior [1719] addiction [20], or diseases [21]. While the identification of the mutated alleles by positional cloning is often laborious, whole-genome sequencing at low coverage can now be used to map mutations rapidly [22, 23]. Two approaches using whole genome sequencing have been developed: the bulk-segregant linkage analysis (BSFseq) that involves a mapping cross, and the homozygosity mapping (HMFseq) [24]. Both rely on bioinformatic filtering for mutagenic polymorphisms, and can be analyzed with the open source computational pipeline MegaMapper available at https://wiki.med.harvard.edu/SysBio/Megason/MegaMapper. More affordable approaches using transcriptome sequencing have also been developed, such as Mutation Mapping Analysis Pipeline for Pooled RNA-seq (MMAPPR) [25] and RNA-seq-based bulk segregant analysis [26]. MMAPPR offers the advantage of identifying mutations without sequencing the parental strain or using a SNP database.

Fig. 2
figure 2

Use of ENU mutagenesis in forward genetic screens and TILLING. (a) Overview of a classical ENU-based mutagenesis forward screen. Males are directly treated with ENU added to the water to induce random germline mutations. Mutated males are outcrossed to WT females to generate F1 families. F1 family members are outcrossed to WT to generate F2 families, whose members are randomly incrossed to produce F3 embryos. F3 embryos are examined for phenotypic defects, and their genomes are analyzed for identification of the causative mutation. (b) Overview of ENU-based TILLING. Males are directly treated with ENU to induce random germline mutations and are outcrossed to WT females to generate F1 families. Males from F1 families are sacrificed after cryopreservation of their sperm and sampling of their DNA for analysis. DNA from F1 males is screened for mutations in genes of interest. The cryopreserved sperm from identified carriers is then selected to generate F2 families using in vitro fertilization, and mutant embryos can be identified by genotyping

While originally used in forward screens that identify mutations after phenotypic analysis, ENU mutagenesis has also been applied in reverse genetic approaches, in which mutations are detected first and then associated with a phenotype. Targeting Induced Local Lesions in Genomes (TILLING) was first used to screen for desired mutated alleles in Arabidopsis, and was successfully adapted to the zebrafish 2 years later [27, 28]. In contrast to ENU -based forward genetic screens in which phenotypic analysis is conducted at the F3 generation, DNA analysis and sperm cryopreservation is performed in F1 families in TILLING (Fig. 2b). After mutations have been identified, the cryopreserved sperm is used in in vitro fertilization to generate F2 families, whose carriers are isolated by genotyping. TILLING alleles are currently being generated and distributed to the zebrafish community by the zebrafish TILLING consortium initiated by the Moens lab at the Fred Hutchinson Cancer Research Center and the Solnica-Krezel lab at Washington University School of Medicine in St. Louis. Alleles can be requested online at http://webapps.fhcrc.org/science/tilling/index.php. Complementary to the zebrafish TILLING project, the Zebrafish Mutation Project from the Wellcome Trust Sanger Institute aims to produce a knockout allele in every protein-coding gene of the zebrafish genome, and has so far generated 26,634 alleles. Mutations are identified after whole exome enrichment and Illumina next generation sequencing, and each allele is analyzed for morphological defects [29]. A list of available lines with mutations is available online at http://www.sanger.ac.uk/sanger/Zebrafish_Zmpbrowse.

2.2 Retroviral and Transposon-Mediated Mutagenesis

While ENU mutagenesis is a powerful approach to generate random mutations at a high rate, it requires a significant degree of effort and commitment to identify the mutations. Insertional mutagenesis using retroviruses or transposons offers the advantage of a fast screening of carriers and a rapid identification of the mutated gene by using the sequence of the insertional element as a “tag” for mapping. Retroviruses and transposons have different insertion site preferences and generate null or hypomorphic alleles, or have no effect, depending on where they integrate in the genome.

Retroviral-mediated mutagenesis was the first insertional mutagenesis carried out in zebrafish in the early 1990s [30]. It used a pseudo-type retrovirus derived from the Moloney murine leukemia virus (MoMLV), with the envelop protein replaced by the glycoprotein from the vesicular stomatitis virus (VSV). Like in human cells, this modified retrovirus was shown to preferentially integrate in regions close to transcriptional starts in the zebrafish genome [31, 32]. It was used by the Hopkins lab at the MIT and others to carry out several large insertional forward screens that led to the generation of hundreds of lines with development defects [3337]. Like ENU mutagenesis and TILLING, retroviral mutagenesis has also been used in reverse genetics by injecting high-titer retroviruses into embryos [32]. Sperm from F1 males is cryopreserved, and mutations are mapped by identifying the genomic sequences flanking the insertion site, or by high-throughput Illumina sequencing to generate a proviral insertion library. Most retroviral insertions have been located in introns, with insertions into the first intron of genes often leading to a decrease in gene expression. Using this approach, the Lin lab at UCLA and the Burgess lab at NHGRI/NIH have generated the Zebrafish Insertion Collection (ZInC) , in which 3054 mutations in genes have been isolated from 6144 F1 fish [38, 39]. Mutant lines can be searched for with the ZInC database (http://research.nhgri.nih.gov/ZInC/?mode=search) and requested through the Zebrafish international Resource Center (ZIRC) (http://zebrafish.org/home/guide.php).

Insertional mutagenesis using transposons has also been used for gene inactivation. Transposons, or “jumping genes”, are mobile DNA sequences that can change their position within the genome, thereby altering it and creating mutations. Insertions and excisions require the activity of the transposase enzyme. Several transposable elements including Sleeping beauty, Ac/Ds and Tol2 have been used in zebrafish for both mutagenesis and transgenesis, Tol2 being the most common [4042]. Transposons have been favored over retroviruses due to their ease of use and their ability to integrate large transgenes. Transposon-based gene-breaking constructs have been improved over the years to simultaneously inactivate a gene (“gene trap”) and insert transgenes of interest such as fluorescent proteins (“protein trap”) (Fig. 3a and b), or to inactivate a gene in a conditional manner. The FlipTrap cassette, for instance, allows conditional mutagenesis thanks to the insertion of loxP and FRT sites for Cre-mediated and Flp-mediated recombination, respectively [43]. When integrated into an intron of a gene, the FlipTrap cassette forms a fusion protein of citrine and the endogenous protein, thereby revealing the expression profile of the targeted gene when it is expressed. Exposure to Cre recombinase removes the citrine sequence and a splice donor sequence associated to it, thereby inducing a truncation of the gene. Flp-mediated recombination allows the exchange of the cassette with any DNA sequence after the integration has occurred. Several FlipTrap lines have been made available through the FlipTrap database (http://www.fliptrap.org/static/anatomies.html). Other efficient gene-breaking constructs such as the RP2 cassette [44], the FlEx cassette [45] or a recently developed bipartite Gal4-containing vector [46] also function as conditional alleles thanks to the presence of loxP and/or FRT sites flanking the mutagenic cassette. Transposon-based cassettes have been used to perform several mutagenesis screens [32, 4749]. A list of gene trap fish lines obtained from the Kawakami lab is provided through the zTrap database (http://kawakami.lab.nig.ac.jp/ztrap/). The zTrap database allows the search for gene trap insertions located within or near genes of interest [50].

Fig. 3
figure 3

Overview of transposon-based gene-trap, protein trap and enhancer trap approaches . All transposon-based gene trap, protein trap and enhancer trap vectors contain transposable elements (TE) that mediate random integration of the vector in the genome. (a) In a gene trap approach, the vector contains a splice acceptor site (SA) upstream of a reporter sequence with a stop codon and a polyA (pA) signal at its 3′ end. Because the reporter does not have any start codon, its transcription depends on the regulation of the endogenous gene by the upstream regulatory element (enhancer). Proper splicing of the SA to the 5′ exon of the gene integrates the reporter into the transcript and generates a truncated protein. (b) In a protein trap approach, the vector contains both a SA and a splice donor site (SD) flanking the reporter sequence. The reporter is devoid of start and stop codons, allowing the fusion between the reporter and the endogenous transcript when integration in an intron is in the correct orientation and proper reading frame. (c) In an enhancer trap, the vector contains a basal promoter with minimal activity upstream of a reporter sequence with a start codon, a stop codon and a pA signal. When the vector integrates near an endogenous transcriptional enhancer, its basal promoter becomes regulated by it and drives the expression of the reporter without any mutagenic effect

2.3 Targeted Mutagenesis

While ENU, retroviruses and transposon-based constructs are powerful mutagenesis tools for forward genetic screens, the genome modifications they generate are random, making it effortful and time-consuming to isolate a mutant for a gene of interest. For many years, methods for engineering specific loci in the genome were restricted to organisms like the mouse, in which embryonic stem cells can be manipulated in a precise way through homologous recombination (HR) . In zebrafish, gene knockdown was transiently achieved by injecting antisense morpholino oligonucleotides (MOs) designed to block the splicing or translation of a targeted mRNA [51]. MOs have been widely used to test gene function, but have recently raised some concerns regarding their specificity [52, 53]. A comparative study looking at more than 80 genes notably reported that around 80 % of the phenotypes observed in MO-injected embryos (“morphants”) could not be detected in the corresponding mutants [54]. These differences have led to the assumption that MO-induced phenotypes often result from off-target effects, and that mutants should become the standard model to describe gene function. On the other hand, deleterious mutations have recently been shown to activate genetic compensatory mechanisms [55]. Further investigation will likely be required to explain the discrepancies between morphant and mutant phenotypes for a specific gene.

The ability to precisely manipulate the zebrafish genome has remained a long-standing quest that was only recently resolved by the discovery of sequence-specific endonucleases and their engineering as genome editing tools. All engineered endonucleases (EENs) consist of a sequence-specific DNA targeting component (protein domain or RNA) and a double-stranded DNA cleaving endonuclease (catalytic domain) that introduces double-strand breaks (DSBs) in the genome. DSBs can be repaired by two different pathways: non-homologous end joining (NHEJ) and homology-directed repair (HDR ). NHEJ can ligate the cleaved DNA double strands without any template but introduces insertions or deletions (indels) at the cut site. HDR, on the other hand, uses a homologous template of DNA to repair DSBs. NHEJ is ten times more active than HDR or HR during zebrafish development [5658]. This error-prone repair mechanism is exploited to introduce a frameshift mutation leading to a non-functional protein. Indel mutations generated by NHEJ are easily detected by analyzing the formation of heteroduplexes between mutant and wild-type (WT) alleles, either by a mobility assay, in which heteroduplexes and homoduplexes have different electrophoretic migration profiles [59], by using enzymes like the endonucleases Surveyor or Cel-I or the bacteriophage resolvase T7E1 that recognize and cut mismatches [6062], or by high resolution melt curve analysis (HRMA) [6365]. The nature of indels can be further characterized by directly analyzing Sanger sequencing data with the poly peak parser software available at http://yost.genetics.utah.edu/software.php [66]. Several EENs including zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and RNA-guided nucleases (CRISPR/Cas9) have been used successfully in zebrafish for targeted mutagenesis (Fig. 4), each of them presenting its own advantages (Table 1). A searchable database, EENdb, collects reported TALENs, ZFNs and CRISPR/Cas systems for different organisms including zebrafish and can be accessed at http://eendb.zfgenetics.org/ [67]. Another software, ZiFit, can be used to design ZFNs, TALENs, or CRIPSRs and is available at http://zifit.partners.org/ZiFiT/Introduction.aspx [68].

Fig. 4
figure 4

Engineered endonucleases for targeted mutagenesis . (a) ZFNs are composed of zinc finger arrays (ZFAs) fused to the catalytic domain of the FokI endonuclease. Each ZFA generally consists in three fingers that each recognizes and binds to a specific 3 nt DNA sequence. Since FokI becomes active upon dimerization, ZFNs work in pairs, cleaving DNA only after each of them has bound to its target sequence. (b) TALENs are constructed by fusing the catalytic domain of FokI to the DNA-binding transcription activator-like effector (TALE) proteins. Each TALE contains an N-terminal translocation domain that recognizes a 5′-T (in red in the DNA sequence), a DNA-binding central repeat domain, and a C-terminal sequence. The central domain contains repeat units composed of 33–35 conserved amino acids, with differences at amino acids 12 and 13 that form the repeat variable di-residue (RVD). Each RVD recognizes and binds to a single specific nucleotide and is therefore responsible for the DNA binding specificity of each repeat unit. Like ZFNs, TALENs function by pairs to cleave DNA. (c) In the CRISPR/Cas9 system, a single guide RNA (sgRNA) recruits the endonuclease Cas9 to the genomic sequence it complements. The sgRNA is composed of 20 nt sequence that directly matches the DNA target sequence, followed by 72–80 nt of the bacterial crRNA/tracrRNA sequence that are required for the formation of hairpin loops stabilizing the sgRNA. Cas9 has two catalytic domains, RuvC and HNH, that each cleaves a DNA strand. The presence of NGG as a protospacer adjacent motif (PAM) is required in 3′ of the target sequence for DNA recognition by Cas9

Table 1 Comparison of mutagenesis approaches available in zebrafish

2.3.1 ZFNs

First described in 1996 [69], ZFNs are chimeric proteins composed of a DNA-binding zinc finger array (ZFA ) fused to the catalytic domain of the non-specific bacterial endonuclease FokI that becomes active upon dimerization (Fig. 4a). Each ZFA generally contains three small Cys2His2 zinc fingers derived from natural transcription factors (“Cys2His2” corresponds to the four residues that coordinate the zinc atom), with each finger recognizing and binding to a specific 3 bp DNA sequence. Many fingers recognizing 5′-GNN, 5′-ANN and 5′-CNN triplets (with N being any base) have been isolated using phage display, and a catalogue of fingers and their binding preferences has been generated [7077]. While in theory, fingers can be assembled into any combination to construct a ZFA against any sequence of interest, designing ZFAs with specific and efficient DNA binding activities has been a challenge, as the interaction of each finger with DNA is context dependent. Several methods involving direct assembly or screening strategies have been developed to generate efficient ZFAs. Modular assembly (MA) directly ligates fingers that recognize different triplets, but does not take into account the context-dependent effects of the DNA sequence, leading to a rather high failure rate [78]. Best success has been achieved using targets composed of 5′-GNN [79]. In contrast to MA, oligomerized pooled engineering (OPEN) uses a bacterial two-hybrid selection method to identify ZFAs with high efficiencies and high affinities from a combinatorial library of multi-finger arrays recognizing 9 bp sequences [80, 81]. A similar approach with a one-hybrid selection system has also been used [82]. While more efficient, these approaches require expertise in constructing libraries and are quite labor-intensive. A more recent and easier method named Context-dependent assembly (CoDA) assembles three-finger arrays by selecting N- and C-terminal fingers from known ZFAs containing a common middle finger, thereby accounting for context-dependent effects between adjacent fingers [83].

Since the FokI endonuclease domain must dimerize to be active [84], ZFNs function by pairs, cleaving DNA only after each of them has bound to its target sequence (Fig. 4a). Obligate heterodimer modifications have been introduced in the FokI catalytic domain to increase ZFN efficacy and reduce off target cleavages [85, 86]. The spacer that separates the ZFA target sequences is relatively short, of variable size and has no sequence requirement. Two to five amino acids can be introduced between the ZFA and the FokI as an inter-domain linker to accommodate the variable size of the spacer in the DNA sequence [87, 88]. While the requirement of two ZFNs to target a sequence provides good specificity and limits off target effects, finding a target sequence in the 5′ region of a gene to generate a null mutation can be limited by the context-dependent affinity of each zinc finger within a ZFA. Nonetheless, ZFNs have been successfully employed for gene targeting in zebrafish since the first reports of their use [82, 89, 90]. ZFN target sites can be identified in several organisms including zebrafish with ZFNgenome, a comprehensive open source accessible at http://bindr.gdcb.iastate.edu/ZFNGenome/ [91]. mRNAs encoding ZFNs are then injected at one-cell stage after ZFNs have been assembled.

2.3.2 TALENs

While useful for targeted mutagenesis, ZFNs have rapidly been challenged by the development of TALENs, which appear to be more mutagenic in zebrafish [92, 93]. TALENs are chimera proteins obtained by fusing the DNA-binding transcription activator-like effectors (TALEs) to the catalytic domain of FokI (Fig. 4b). TALEs were originally identified in the bacterial plant pathogen Xanthomonas and were named for their ability to trigger the expression of genes promoting infection in the host cell [94].

Each TALE is composed of an N-terminal translocation domain that recognizes a 5′-T, a DNA-binding central repeat domain, and a C-terminal sequence. The central domain contains 15.5–19.5 repeat units composed of 33–35 conserved amino acids, with differences at amino acids 12 and 13 forming the repeat variable di-residue (RVD) (the last repeat unit contains only 20 amino acids and is referred to as a half repeat). Each RVD recognizes and binds to a single specific nucleotide and is therefore responsible for the DNA binding specificity of each repeat unit. The RVDs NI, HD and NG are commonly used to target the nucleotides A, C and T, respectively, while NN, NK and NH can be employed for targeting a guanine, with NK and NH binding more specifically but with a weaker affinity [9597]. In contrast to ZFNs, whose efficiency is context-dependent, TALENs do not have much requirement in terms of the targeted sequence besides a 5′-T and a minimum length of 11 RDVs for the binding domain [98]. Several online tools such as TALE-NT [99] (https://tale-nt.cac.cornell.edu/node/add/talen), Mojo Hand [100] (http://www.talendesign.org/), or idTALE [101] (http://omictools.com/idtale-s5415.html) can be used to identify the optimal TALEN target sequence within a gene of interest.

Like ZFNs, TALENs are engineered with the FokI endonuclease catalytic domain as an obligate heterodimer and must therefore work by pairs to cleave DNA. The optimal spacer length seems to depend on the scaffold of the TALEN, with TALENs containing short C-terminal lengths (17–28 amino acids) being more efficient with shorter spacers (12–14 bp). Several methods have been developed for constructing TALENs and dictate the scaffold used. The Golden Gate cloning strategy uses restriction digest of a TALE plasmid library by type II endonucleases followed by ligation [102]. This approach is theoretically a one-step assembly that can construct TALE repeats in one digest and ligation reaction. While several Golden Gate derived methods have been generated and some commercially available, they do not always use the same TALE scaffold and are not always compatible [103107]. Other approaches such as the Unit assembly method [108, 109] and the restriction enzyme and ligation (REAL) method [110] rely on standard molecular cloning using hierarchical restriction digests and ligations. While effective, these methods are labor-intensive and do not allow the construction of TALENs in a large scale. High-throughput can be achieved with the fast ligation-based automatable solid-phase high-throughput (FLASH) or the iterative capped assembly (ICA) methods that use solid-phase ligation on magnetic beads instead of the time-consuming transformation and growing of bacteria [111, 112].

Due to their higher mutation frequencies, the rarity of off-target effects, and the presence of target sequences in almost every gene, TALENs have quickly become the method of choice for mutagenesis in zebrafish since their first application in 2011–2012 [64, 109, 113115]. As for ZFNs, mRNAs encoding TALENs are injected at one-cell stage. TALEN efficiency can be assessed the day after by analyzing heteroduplex formation in injected embryos.

2.3.3 CRISPR/Cas9 System

The CRISPR (clustered regularly interspaced short palindromic repeats) / Cas (CRISPR associated proteins) system was originally identified as a defense mechanism used by bacteria and archae against the introduction of foreign nucleotides form bacteriophage and exogenous plasmids [116118]. Invading nucleic acids are first recognized as foreign and integrated as spacers between short DNA repeats (the CRISPR locus) in the host genome, thereby forming CRISPR arrays. Transcription of these CRISPR arrays generates primary transcripts, or pre CRISPR RNAs (pre-crRNA) that are subsequently cleaved into small CRISPR RNAs (crRNAs ). Upon infection, the crRNAs whose spacers have a sequence close to the invading nucleic acids bind to them, and recruit a second non-coding RNA with partial complementarity to the crRNA named auxiliary trans-activating crRNA (tracrRNA ). The complex tracrRNA/crRNA in turn recruits nucleases associated with the CRISPR locus named Cas to degrade the intruder nucleic acids and prevent pathogen invasion. Of particular interest is Cas9, an endonuclease that introduces DSB s in the target DNA thanks to its two nuclease active sites, RuvC and HNH, that each cleaves a DNA strand (Fig. 4c). Several groups saw the genome-editing possibilities offered by the CRISPR/Cas9 system and adapted it for its use in eukaryotic cells. The crRNA and tracrRNA of Streptococcus pyogenes were fused into a single guide RNA (sgRNA) named for its ability to recruit and activate Cas9 [119]. On the other hand, the sequence of the Streptococcus pyogenes Cas9 has been modified by codon optimization and the introduction of nuclear localization signals to promote its use in eukaryotic cells [119121].

Since the targeting properties of the CRISPR/Cas9 system only rely on the sequence of the sgRNA , it has become very easy to target any sequence of interest in the genome. Each sgRNA is composed of a 20 nt sequence that directly matches the target sequence, followed by 72–80 nt of the 3′ crRNA/tracrRNA sequence that are required for the formation of hairpin structures stabilizing the sgRNA [122, 123]. The only constraint for the design of sgRNAs is the presence of NGG at the 3′ end of the target site that acts as a protospacer adjacent motif (PAM) required for DNA recognition by Cas9 and Cas9 subsequent activation (Fig. 4c) [124127]. The requirement of NGG as a PAM currently limits the number of sequences recognized by Cas9, but a recent study has successfully engineered efficient Cas9 derivatives with altered PAM specificities, thereby expanding the repertoire of PAMs needed [128]. Several servers and online softwares have been specifically developed for the design of sgRNAs, including CRISPRdirect (http://crispr.dbcls.jp/) [129], the Optimized CRISPR Design (http://crispr.mit.edu/) from the Zhang lab, the Cas9 Online Designer (http://cas9.wicp.net/) developed by Dayong Guo, Cas-Designer (http://rgenome.net/cas-designer/) [130], sgRNACas9 (http://www.biootools.com/) [131], and CHOPCHOP (https://chopchop.rc.fas.harvard.edu/) [132].

Because of its ease of use and affordability, the CRISPR/cas9 system has rapidly been applied in zebrafish, with mutagenesis rates comparable or superior to those obtained with TALENs [133136]. sgRNAs are obtained by in vitro transcription from plasmids or oligos and co-injected with Cas9 mRNA at one-cell stage to induce DSB s in the target sequence. Some studies have also directly injected the Cas9 protein with sgRNAs to increase mutagenic activity [137139]. A major advantage of the CRISPR/Cas9 system is that its high efficiency is sometimes sufficient to introduce extensive biallelic mutations causing phenotypes in injected embryos, a feature not often seen using TALENs [140]. In addition, it offers the possibility to simultaneously target multiple sequences at once by co-injecting several sgRNAs , or by using a plasmid with multiple sgRNA cassettes under the control of U6 or H1 promoters, a process named CRISPR multiplexing [140143]. Multiplexing has recently been employed in a high-throughput mutagenesis set-up to successfully generate mutations in 83 different genes in the zebrafish genome [144]. At lower scale, multiplexing can be very useful to generate double or triple mutants in related genes for which single mutants would lack a phenotype due to compensatory mechanisms. It can also be employed to study the role of non-coding RNA genes that are not affected by changing the frame of translation. The identification of optimal targets in multiple locus has been facilitated by the recent development of specialized softwares such as CRISPRseek (http://www.bioconductor.org/packages/release/bioc/html/CRISPRseek.html) [145] or CRISPR MultiTargeter (http://www.multicrispr.net/) [146]. Finally, a last advantage of the CRISPR/Cas9 system is the possibility of disrupting gene function in a spatially controlled manner by injecting a modular vector that contains an sgRNA cassette under the control of a U6 promoter and a Cas9 cassette under the control of a tissue- or cell-specific promoter [147].

Altogether, the CRISPR/Cas9 system offers so many advantages that it has quickly become the method of choice for targeted mutagenesis in zebrafish. A major drawback, though, is the rather high frequency of off-target effects observed in various models [122, 123, 148150]. The short length of the target sequence, and the tolerance of Cas9 for mismatches between the target sequence and the sgRNA, can lead to the mutation of secondary targets in the genome that would need repeated outcrossing to be eliminated. Choosing unique target sequences using the specific softwares mentioned above is thus important. Specificity can be further improved by using truncated sgRNAs (17 nt) that have a decreased mutagenesis rate at off-target sites [151]. Finally, Cas9 variants possessing only one nuclease catalytic site instead of two can be used [122, 152, 153]. These Cas9 “nickases” introduce nicks in one DNA strand only, and must be used in pairs with two sgRNAs to introduce DSBs. This system is thus analogous to TALENs or ZFNs by requiring a dual recognition of the targeted DNA sequence.

2.4 Chromosomal Deletions and Inversions

In addition to generating small indel mutations in a gene of interest, large genomic deletions or inversions can be introduced by injecting several TALEN pairs or multiple sgRNAs. DSB s are introduced simultaneously at two separate sites, leading to the loss, or more rarely the inversion, of the DNA fragment in between. In zebrafish, genomic deletions with sizes ranging from several hundred bases to 1 Mb have been reported [141, 154, 155]. Introducing large deletions has proved useful in different systems to study the role of cis-regulatory sequences [156, 157], or to recreate translocations similar to those found in human diseases [158160].

3 Transgenesis in Zebrafish

Transgenesis is defined as the introduction of exogenous genes, or “transgenes” , into the genome of a living organism. The first zebrafish transgenic lines were obtained by the random integration of transgenes and regulatory promoters in the genome using retroviruses or transposon-based systems. By expressing transgenes such as fluorescent proteins or genes with dominant negative mutations, and by providing a spatial and/or temporal control of gene activation, these transgenic lines proved to be powerful tools for observing the fate and behavior of cells and tissues, studying gene regulation, and testing gene function in development, behavior and diseases. More recently, TALENs and the CRISPR/Cas9 system have revolutionized zebrafish research by allowing the insertion of sequences into specific loci in the genome, making the generation of knock-ins (KIs) finally possible.

3.1 Transposon-Based Transgenesis with Random Insertion

3.1.1 Enhancer Traps and Protein Traps

As mentioned previously for transposon-based mutagenesis, transposon-based constructs have been used in gene trap, enhancer trap or protein trap configurations (Fig. 3) in the context of high throughput screens. All constructs possess transposable elements (TE) derived from Sleeping beauty or Tol2 transposons that allow random integration in the genome. While gene traps are used for insertional mutagenesis (discussed earlier in section 2.2.), enhancer traps have a limited mutagenic effect and are designed to report the transcriptional activity of enhancers located nearby their site of integration (Fig. 3c). Proteins traps are constructed to create a fusion between the full-length trapped gene and the reporter, allowing the visualization of protein expression in the embryo (Fig. 3b).

Numerous enhancer trap screens have been conducted in zebrafish and have led to the creation of a large library of transgenic lines expressing fluorescent reporters or drivers in specific cells and tissues. Several basal promoters have been employed, including keratin 4 (krt4) and keratin 8 (krt8), gata2, hsp70, c-fos, Eb1, ef1a, thymidine kinase, the carp β-actin promoter (TKBA), and the medaka edar locus [161170]. These basal promoters have various trapping efficiencies and can drive different expression profiles based on their sensitivity to the genomic enhancer regulating them [163]. Although all basal promoters have been useful to reveal specific patterns of expression during development, some have a bias for traps with expression in specific structures (for instance, the E1b promoter has a strong bias for cranial ganglia), while others can drive non-specific background expression is tissues such as the muscles or the dermis. Several reporters have also been used, the most common being fluorescent reporters like EGFP to monitor transcriptional activity and follow the movement or differentiation of the cells labeled, and Gal4, to drive effector gene expression where Gal4 is expressed using the Gal4/UAS system. Several Gal4 enhancer trap lines have been generated in combination with a UAS:EGFP or UAS:Kaede reporter, where the photo-convertible Kaede fluorescent protein can be used for mapping neural circuits or cell lineages [163, 164]. A collection of enhancer trap lines is described in the ZETRAP 2.0 database available at http://plover.imcb.a-star.edu.sg/webpages/home.html [171].

In parallel to enhancer traps, protein traps have been developed to generate an in-frame fusion between the full-length trapped gene and the reporter (Fig. 3b). By retaining all the regulatory sequences of the endogenous genes, this approach allows detailed studies on the expression of the protein trapped and its regulation as well as its localization within cells. As mentioned previously, protein traps have mostly been combined with gene-traps to allow simultaneous gene inactivation and protein inactivation [43].

3.1.2 The Gateway System for Easy Transgenesis

While enhancer and protein trap constructs have been instrumental for visualizing developing tissues or protein localization, they are not particularly suited for overexpressing a gene of interest in a temporally or spatially controlled manner. Transgenesis is an essential tool for testing gene and cell function, but has been historically laborious in zebrafish due to technical limitations such as laborious conventional cloning and low rates of germline transmission when using supercoiled or linear DNA [172, 173]. To overcome these limitations, the Tol2Kit system (that uses the recombination-based cloning of multiple DNA fragments) was designed to easily generate expression constructs for transgenesis [174]. This multisite Gateway technology relies on the att site specific recombination system from the λ phage [175], and uses different engineered att sites that recombine specifically to assemble up to five DNA fragments in a directional manner. Three different “entry” clones containing a promoter, a coding sequence of interest, and a polyA or a tag, respectively, are recombined into a “destination” vector that also possesses Tol2 recombination elements for integration in the genome with high efficiency. The plasmid hence generated is co-injected with transposase mRNA at one-cell stage for transient or stable transgenesis. Carriers of the transgene are usually easily identified by the expression of a reporter gene. A main advantage of this approach is its modularity that allows the generation of libraries of entry clones with promoters or genes of interest. For instance, entry clones with the promoter element from the hsp70 gene [176] or a UAS promoter have been generated for conditional expression. A list of essential Tol2Kit clones can be requested online at http://tol2kit.genetics.utah.edu/index.php/Main_Page. By providing a simple, affordable and flexible system to generate transgenesis constructs, the Tol2Kit has largely facilitated zebrafish research, promoting the sharing of clones within the zebrafish community and making transgenesis available for any lab. Several labs have expanded the number of clones using the gateway technology and made their resources available (http://lawsonlab.umassmed.edu/gateway.html). To date, a list of 14,524 transgenic lines generated with either “trap” or Gateway constructs can be viewed on Zfin at http://zfin.org/action/fish/search.

3.2 Targeted Transgenesis and the Generation of Knock-Ins

By providing an efficient approach to integrate DNA constructs into the genome, transposon-derived elements have been instrumental for the study of gene function and tissue morphogenesis in zebrafish. However, transposon-mediated integrations occur randomly, precluding precise genome editing. TALENs and CRISPR/Cas9, on the other hand, allow targeted engineering and have been recently employed to insert small or large sequences at precise loci into the genome. Several methods involving the NHEJ or the HDR pathways have successfully led to the generation of the first KIs in zebrafish.

3.2.1 Integration via the HDR Pathway

DNA integration mediated by HDR has been achieved with both TALENs and the CRISPR/Cas9 system. Several templates including linearized plasmid and single-stranded DNA (ssDNA ) have been used with various integration efficiencies.

One of the first reports of gene targeting via HR in zebrafish used TALENs and a linearized DNA vector containing the cassette to be inserted flanked by homologous sequences to the genomic target of around 800 and 900 bp on each side [177]. Several cassettes with loxP, eGFP, or eGFP-stop sequences were used to modify three different loci in the genome. Authors co-injected the linearized donor plasmids with TALEN mRNAs at one cell stage and were able to detect HR between the donor plasmids and the endogenous loci, with transmission to the germline in one case, albeit with low efficiency (about 1.5 %). Subsequent studies demonstrated that the length of homology arms as well as the configuration of the targeting construct have a significant impact on the efficiency of HDR [178]. In particular, increasing the length of the left and right arms to 1 and 2 kb, and introducing a DSB in the shorter homology arm, were shown to greatly improve efficient HR and germline transmission (over 10 %).

HDR has also been achieved using ssDNA with short homology arms as a template together with TALENs [113] or the CRISPR/Cas9 system [133, 134, 136]. Short fragments encoding restriction sites (6 bp) or loxP sites (34 bp) have been successfully integrated after co-injecting TALEN mRNAs and ssDNA oligonucleotides with short homology arms of 20 and 18 bp [113]. Interestingly in that case, increasing the length of homology arms seemed to reduce the frequency of HDR integrations. While germline transmission of the integrated DNA could be observed in 10 % of the cases, a major drawback of this approach was the frequent imprecise integration of the donor DNA with additional indel mutations. Similar results were obtained after co-injecting ssDNA oligonucleotides with sgRNAs and Cas9 mRNA [133, 134, 136]. All studies reported so far achieved precise integration of the template in the targeted genome location with various efficiencies. However, in all cases, imprecise repair events were frequently detected as a probable result of NHEJ . Inhibit the NHEJ pathway by blocking the activity of endogenous DNA ligase IV with the Scr7 inhibitor has recently been shown to increase the efficiency of HDR-mediated genome editing in mammalian cells and mice [179, 180], and might lead to similar improvement in zebrafish in the future.

3.2.2 Integration via the NHEJ Pathway

Considering the prevalence of NHEJ in zebrafish during development [5658], recent studies have exploited the NHEJ pathway to elicit targeted integration of large donor DNAs [181, 182]. In this approach, the donor vector contains a short sequence bearing the TALEN or the CRISPR target site upstream the cassette to be integrated. Co-injection of this donor plasmid with sgRNA and Cas9 mRNA (or alternatively TALEN mRNAs) lead to the concurrent cleavage of the plasmid and the genomic target, and the subsequent integration of the linearized plasmid at the genomic target by an NHEJ repair mechanism. Alternatively, two different sgRNAs can be used, one for cleaving the genome target, and the other for cleaving the donor plasmid. This method has proved to be very efficient for generating KIs allele, with rates of germline transmission over 30 %. However, since the integration of the donor plasmid can occur in both forward or reverse orientation and three different frames, screening efforts are necessary to isolate the appropriate lines. Introducing short homologous sequences flanking the sgRNA target site into the donor seems to improve the precision of integration by involving both NHEJ and HDR mechanisms [183]. Alternatively, selecting an sgRNA target in the intron of the gene sequence, and using a donor vector with a homologous arm spanning from that sgRNA site to the 3′ region of the targeted gene, can be employed to circumvent the requirement of in-frame insertions and increase KI efficiency [184].

4 Conclusions and Future Directions

Overall, the last 40 years have witnessed major advances in zebrafish research, from the first mutagenesis screens to the use of EENs for targeted genome editing . With precise genomic manipulations now available, the zebrafish has caught-up with other vertebrate organisms, combining genomic approaches previously restricted to the mouse with screening and high-resolution imaging techniques only possible in fish. This progress has promoted the development and use of mutant and transgenic lines in a wide number of research areas, and in neurobehavioral phenotyping research in particular. The following examples illustrate the wide spectrum of neural phenotypes studied, and their relevance to selected human brain disorders (see also Tables 2 and 3 for additional examples).

Table 2 Selected examples of aberrant neurobehavioral phenotypes demonstrated by mutant zebrafish lines
Table 3 Selected examples of aberrant neurobehavioral phenotypes demonstrated by transgenic zebrafish lines

The Allan–Herndon–Dudley syndrome (AHDS ) syndrome is a rare developmental nervous system disorder characterized by severe intellectual disability, muscle hypotonia and spastic paraplegia. It is caused by mutations in the mct8 (slc16a2) gene located on the X chromosome that encodes a thyroid hormone receptor. Impaired Mct8 function is thought to prevent the entry of the active T3 hormone into neurons, leading to abnormal neurological development. While the MCT8 knockout mouse recapitulates the metabolic and endocrine defects seen in patients, they do not have any neurological or behavioral phenotype. In order to determine the functions of Mct8 in AHDS, Zada and colleagues used ZFNs to generate an mct8 zebrafish mutant line [185]. Video-tracking behavioral imaging as well as time-lapse imaging of neuronal circuits showed that Mct8 zebrafish mutants had a reduced locomotor activity that correlated with defects in synaptic density of motoneuron arbors and abnormal axonal branching in sensory neurons. Additional behavioral defects were observed, including increased and more fragmented sleep, and altered responses to light variations. Thus, the use of ZFNs to induce targeted mutation in Mct8 in zebrafish lead to the development of the first vertebrate model of AHDS that recapitulates the full spectrum of defects seen in patients. As zebrafish is particularly suited for large-throughput approaches, this mct8 mutant line could further be used in pharmacological screens for therapeutic development.

In addition to modeling neurological developmental diseases , zebrafish mutant lines have been used to study complex behaviors. Sleep, for instance, is an evolutionary conserved state that is essential to all organisms, but whose regulation remains poorly understood. In particular, the factors that transmit circadian information to regulate sleep are largely unknown. Using TALENs as a mutagenesis approach, Gandhi and colleagues generated a new zebrafish line harboring a null mutation in the aanat2 gene that encodes an enzyme essential for melatonin synthesis [186]. Videotracking assays revealed that aanat2 mutants had a normal sleep pattern during daytime but a reduced sleep and a longer sleep latency during night. Importantly, circadian rhythms were not disrupted in mutants, revealing for the first time that melatonin is not required to initiate or maintain the circadian clock. Altogether, the use of TALENs is this example allowed the generation of the first genetic loss-of-function model for melatonin in a diurnal vertebrate, and lead to the discovery of the endogenous functions of melatonin in sleep regulation.

Complementary to mutants, transgenic lines have proved very helpful to decipher the mechanisms underlying complex behaviors and neurological disorders (see Table 3). Narcolepsy, for example, is a rare chronic sleep disorder involving excessive daytime sleepiness, sleep fragmentation and paralysis at night, hypnagogic hallucinations and cataplexy. It is caused by the selective degeneration of hypothalamic hypocretin/orexin (HCRT) neurons, whose activity is known to regulate several other behaviors including food intake, reward or drug addiction. To generate a zebrafish model of narcolepsy, Elbaz and colleagues used Tol2-mediated transgenesis to establish a stable transgenic line expressing the nitroreductase nfsB gene under the control of the hcrt promoter [187]. Exposing transgenic embryos to the drug metrodinazole induces the apoptosis of cells expressing nfsB, providing an inducible method to selectively ablate HCRT neurons at specific times. As expected, transgenic embryos lacking HCRT neurons recapitulated the defects seen in narcoleptic patients, including increased sleep time and transitions between wake and sleep states. They further had altered locomotor responses to light and sound, suggesting a broader function of HCRT neurons in mediating behavioral responses to external stimuli.

Transgenesis not only allows the ablation or silencing of a specific class of neurons regulating complex behaviors, but can also be used to express specific mutations in genes causing human disorders. Amyotrophic lateral sclerosis (ALS ) is an adult-onset lethal neurodegenerative disease characterized by the progressive loss of motor neurons. While the majority of ALS is sporadic, around 10 % of cases are familial and caused by mutations in certain genes. Among them, mutations in the superoxide dismutase Sod1 gene have been associated with 20 % of familial ALS (fALS) . Several zebrafish transgenic lines expressing the SOD1-G93A mutation have been generated using Tol2-mediated transgenesis [188, 189]. They all showed defects associated with fALS including abnormal motor neuron outgrowth and branching, loss of neuromuscular junctions, muscle atrophy and motor neuron cell loss leading to premature death. These new transgenic lines thus provide an additional system for observing the progression of ALS directly in vivo in an intact organism, and isolating new effective compounds in therapeutics screens.

Overall, targeted mutagenesis and transgenesis have broaden the field of zebrafish research in many areas. The development of conditional mutant and targeted transgenic lines is now under way and will expand the repertoire of lines and resources currently available. For instance, one recent study has introduced attP site at specific loci into the genome for future recombination-mediated site-specific transgenesis [190]. Other new approaches that could be adapted to zebrafish research include Cas9 engineering to regulate transcription [191193]. The zebrafish model is looking at a bright future, one that George Streisinger would be proud of.