Introduction

Next generation sequencing has greatly accelerated experimental investigations into the genetic etiology of human diseases. The availability of high throughput sequencing data now makes “systems genetics” a possibility with the use of large-scale forward genetic screens for novel gene discovery (Caruana et al. 2013; Hill et al. 2013). Forward genetic screens in mice and other model organisms with chemical mutagenesis using ethylnitrosourea (ENU) have yielded new insights into the genetic basis for a wide spectrum of biologic and disease processes (Arnold et al. 2011; Guenet 2004), including genes that can cause congenital heart disease (CHD) (Shen et al. 2005; Yu et al. 2004; Zhang et al. 2009).

While mutagenesis screens have been highly successful in recovering mutants with a variety of defect phenotypes for disease modeling, mutation recovery remains challenging. While this process is becoming more straightforward with easier access to whole exome sequencing analysis (Bull et al. 2013), mutation mapping is often still an important part of the mutation recovery effort to clearly identify the mutation(s) responsible for the given phenotype. This usually entails crossing the mutation generated in one inbred strain into a different inbred strain, and conducting a full genome scan to examine segregation of the disease phenotype with a panel of DNA markers polymorphic between the two strains (Bode et al. 1988; Xia et al. 2010; Zhang et al. 2009). However, a common problem with such mapping intercrosses is the common observation that the mutant phenotype can be modified or even lost due to genetic modifier effects (Nadeau 2001). To avoid the confounding effects arising from genetic modifiers, intercrosses can be carried out with more closely related mouse strains, such as intercrosses with the closely related C57B10 mouse strain to map mutations generated in the C57BL6 background.

Mapping ENU induced mutations in mice can be carried out using a panel of 50–150 polymorphic DNA markers comprising a combination of microsatellite DNA repeats and single nucleotide polymorphic (SNPs) markers distributed throughout the genome. A panel of B6/B10 polymorphic markers have been previously described for mapping ENU induced mutations in the B6 background (Xia et al. 2010). While an Illumina high density mouse SNP genotyping array containing approximately 1,400 markers and the mega-MUGA array with 77,000 markers are commercially available, these SNP arrays are not suitable for mapping polymorphisms between the closely related B6 and B10 mouse strains. Therefore, in this study, we investigated the feasibility and cost effectiveness of conducting a full genome scan using bulk amplicon sequencing with the Ion Torrent next generation sequencing system for mapping mutations in B6/B10 mice.

To test the efficacy of Ion Torrent sequencing for mutation mapping, we focused our analysis on Bishu, a newly recovered mutant mouse model exhibiting complex congenital heart disease (CHD) associated with heterotaxy obtained from an ongoing mouse ENU mutagenesis screen. We conducted a full genome scan with 123 SNPs using amplicon sequencing with the Ion Torrent and developed a custom bioinformatics pipeline for automating the analysis of the resulting next generation sequence data. Through this analysis, we mapped the mutation in Bishu to mouse chromosome 4 and identified the mutation as a splicing defect mutation in Dnaic1.

Results

Bishu, a mutant recovered from a mouse ENU mutagenesis screen, conducted in the NHLBI Bench to Bassinet Cardiovascular Development Consortium, exhibited congenital heart disease (CHD) associated with laterality defects (Fig. 1). Bishu mutants can have complete reversal of visceral organ situs known as situs inversus totalis (Fig. 1b), or they can exhibit heterotaxy where the left–right positioning of organs may be randomized (Fig. 1c). A spectrum of complex CHD is usually observed in conjunction with heterotaxy, such as double outlet right ventricle (Fig. 1d, e) with multiple ventricular septal defects (Fig. 1f, g) and hypoplastic transverse arch (Fig. 1h). As Bishu was recovered from a C57BL/6J inbred strain background, Bishu mice were intercrossed with another inbred strain, C57BL/10J (B10), to generate B6/B10 hybrid offspring for the mapping analysis. These B6/B10 hybrid animals were intercrossed to generate homozygous mutants that were used for conducting a full genome scan. The mutation map interval was identified by tracking segregation of homozygous B6 markers with the Bishu disease phenotype.

Fig. 1
figure 1

Congenital heart disease and left–right patterning defects in Bishu mutants. ac Bishu mutants can exhibit abnormal visceral organ situs such as situs inversus b in which all organs show mirror symmetry or heterotaxy c with discordant organ situs. Normal situs solitus animal a show levocardia with heart apex pointing to the left (denoted by white arrow), normal pattern of three lung lobes on the right and two on the left, and with stomach positioned on the left. In contrast, Bishu mutant with situs inversus shows dextrocardia (white arrow pointing to the right) with reversed lung lobation and stomach positioned on the right b. In c is Bishu mutant with heterotaxy with the heart showing a mid-line placement or mesocardia (white arrow pointing down the midline). This mutant had stomach on the left (not visible in this image) and left pulmonary isomerism with bilateral single lung lobes. (dh). Bishu mutant showing complex congenital heart disease involving malalignment of the great arteries. This mutant was stillborn and exhibited dextrocardia such that the morphological right ventricle (mRV) was positioned on the left. The atrial septum failed to form, giving a common atrium (f, g), and both the pulmonary outflow (PT) and the aorta were connected to the mRV, indicating a double outlet right ventricle. This was accompanied by multiple ventricular septal defects (VSD) and also narrowing of the transverse arch (denoted by arrowheads in h). i, j Cilia in tracheal epithelia of Bishu mutant showed missing out dynein arm defect. Cilia of the mouse tracheal airway epithelia are shown in cross section view. In control animal (i), the outer microtubule doublets showed abundant outer dynein arms (white arrowheads) but in the Bishu mutant cilia, very few outer dynein arms are observed (j)

Ion Torrent amplicon sequencing

Seven homozygous B6/B10 Bishu mutant animals with the CHD/laterality defect phenotypes were used for the genome scan mapping analysis. A panel of 123 B6/B10 polymorphic DNA markers previously shown to be effective for mapping mutations was used for this analysis (Xia et al. 2010). Primers were designed to generate short DNA fragments (150–300 bp) encompassing each SNP (Supplementary Table S1) to allow the use of the Ion Torrent 200 bp sequencing chemistry. For each mutant, sequence fragments spanning each of the 123 markers were PCR amplified and then bar coded with a unique sequence tag. In parallel, we also tested the efficacy of multiplex PCR in which a single amplification reaction was performed for all 123 DNA markers and the resulting product was similarly bar coded. The bar coded PCR products encompassing all 123 markers from all seven mutant animals were pooled, and a single library was constructed and processed for sequencing using a single 314 chip on the Ion Torrent personal genome machine (PGM) (Fig. 2). The total sequence data output was aligned to the reference genome created by concatenating the sequences of all the amplicons. Overall, we obtained 615,712 reads providing an average ~500× coverage (Table 1). To assess the quality of the sequencing run, we examined the AQ20 score, which corresponds to the longest length at which the error rate is 1 % or less. The mean AQ20 was 153 with 99 % of the library covered at this length. We also examined the “perfect” length score corresponding to the longest length at which there were no mismatches. We observed 99 % of the reads had mean “perfect” length score of 110 bp (Torrent Suite 2.2 User Documentation). Together these assessments show the sequencing data generated are of high quality.

Fig. 2
figure 2

Ion Torrent amplicon sequencing workflow. A schematic diagram of the steps involved in amplicon sequencing for genome scan analysis are shown. 150–200 bp around each SNP are amplified either individually or in a single multiplex PCR. Barcoded libraries generated from each mutant are then pooled to make a single library that is then amplified for sequencing on the Ion Torrent PGM

Table 1 Sequencing metrics

Ion Torrent amplicon sequencing analysis

To analyze the large volume of Ion Torrent sequencing reads and extract the genotype calls for each of the 123 SNPs, a custom bioinformatics pipeline was developed comprising a PERL script that uses mpileup command from SAM tools (samtools.sourceforge.net/mpileup.shtml) to tabulate the genotype calls (see Supplementary Tables S2–S9). This automation streamlined the determination of genotype calls for all 123 SNPs (Table 2). Unambiguous genotype calls were obtained for all 123 SNPs that were PCR amplified individually (columns 3–9 in Table 2). In the multiplex PCR sample, genotype calls were successfully generated for 85 % of SNPs. These exhibited sequencing coverage that was comparable to those obtained with individual PCR amplifications (Column 9 in Table 2). To validate the results obtained by Ion Torrent amplicon sequencing, we carried out a parallel genotyping analysis of six mutants with a subset of 62 SNPs using Sanger capillary sequencing analysis. The genotype calls obtained by Sanger sequencing (Supplementary Table S10) were concordant with those identified by the PGM sequencing analysis, showing the efficacy of SNP genotyping by Ion Torrent amplicon sequencing.

Table 2 Genome scan results from ion Torrent amplicon sequencing

Genome scan mapping and sequencing analysis identifies the Bishu mutation

The genome scan analysis mapped the Bishu mutation to one genomic interval—a 17 MB region situated between SNP rs13477622 (Chr 4: 28249560) and rs49519173 (Chr 4: 45462131; mm9) in the proximal end of chromosome 4 (black box in Table 2). This region was observed to be consistently B6 homozygous in all 7 mutants. This map interval was confirmed with further analysis of the linkage data using recombinant interval haplotype analysis (Neuhaus and Beier 1998). Examination of this chromosome 4 region revealed a gene known to cause CHD and laterality defects, Dnaic1 (Guichard et al. 2001). Dnaic1 is an axonemal dynein required for motile cilia function, and is a gene that is also known to be associated with primary ciliary dyskinesia (PCD), a sinopulmonary disease arising from mucus clearance defects due to motile cilia defects in the airway. Similar to Bishu mutants, PCD patients also can exhibit situs inversus totalis or heterotaxy, a reflection of the dual requirement for motile cilia both in left–right patterning and airway clearance.

Sequencing of cDNA obtained from a Bishu mutant embryo revealed transcripts with exon 5 deleted (Fig. 3). Sequencing of genomic DNA from Bishu revealed a G to A substitution flanking the splice donor site of exon 4 (Dnaic1 c.204+1G>A), which would account for the observed exon 5 skipping and would predict a reading frame shift resulting in protein truncation after a 15 amino acid insertion beyond residue 76 of the Dnaic1 protein (p*76Argext*15, Fig. 3). Further genotyping analysis confirmed all Bishu mutants are homozygous for this Dnaic1 mutation, validating this as the disease causing mutation. Consistent with Dnaic1 as the gene harboring the disease causing mutation, we observed Bishu mutants have motile cilia defects. Thus instead of the normal rapid synchronous ciliary beat driving fluid flow across the tracheal respiratory epithelia and in brain ependymal tissue, Bishu mutants exhibited immotile/slow/dyskinetic cilia with little or no net fluid flow (see Supplemental Movie). Consistent with the compromised ciliary motion in the ependyma, Bishu mutants surviving postnatally usually die from hydrocephalus by 5−10 days after birth. Furthermore, analysis by electron microscopy confirmed Bishu mutant airway cilia are missing the outer dynein arms (Fig. 1i, j), a cilia ultrastructural defect associated with Dnaic1 mutations in mice and in PCD patients (Guichard et al. 2001; Ostrowski et al. 2010; Pennarun et al. 1999).

Fig. 3
figure 3

Dnaic1 splicing defect mutation identified in Bishu mutant. Schematic of the mouse Dnaic1 gene is depicted with a point mutation in exon 4 positioned at the 3′ splice junction, resulting in the use of an alternative splice donor and acceptor site in exons 4 and 6, respectively. The mutation c.240+1G>A in the genomic region is highlighted in red and the wild type is highlighted in green. Also shown are the wild type and mutant transcripts, and the mutant polypeptide generated from the mutant transcripts without exon 5 (blue)

Efficiency of amplicon sequencing

To assess the efficacy of Ion Torrent amplicon sequencing for mapping analysis, we considered the cost of conducting genome scan on different sample sizes on the Ion Torrent (Table 3). Based on the Ion Torrent PGM run report and coverage statistics, we expect 100× coverage can be achieved with the multiplexing of 40 samples in a single run using the Ion 314 chip (Table 3). With the Ion 316 chip, 128 samples can be simultaneously sequenced using the available bar codes (Table 3), and many more can be accommodated as additional bar codes become available. With amplicon sequencing, there is also savings in personnel time with the automation of genotype calling, made possible with the bioinformatics pipeline, we have developed for analysis of the Ion Torrent PGM short sequence reads. Thus, PGM amplicon sequencing is easily scalable to maximize the efficiency and cost effectiveness of high volume custom genotyping and genome wide mapping analyses.

Table 3 Cost comparison for full genome scan linkage analysis

Discussion

We show the efficacy of next generation amplicon sequencing with the Ion Torrent PGM for genome scan mapping. Using this approach, we mapped an ENU induced mutation causing CHD associated with laterality defects in the Bishu mutant mouse line to a 15 Mb interval on mouse chromosome 4. This made it possible to identify the disease causing mutation as a splicing defect mutation in Dnaic1. This would be predicted to yield a loss of function mutation given the splicing defect is predicted to generate a frame shift after amino acid 76, and consistent with this, Bishu mutants exhibit phenotypes similar to that found in the Dnaic1 knockout mice (Francis et al. 2012). Using this same amplicon sequencing method, we have performed genome scan analysis to map the mutations in two other mutant lines with congenital heart defects. This allowed the mapping of the mutation in each line to a 35 MB region on mouse chromosome 15, confirming the general utility of this strategy for genome wide mapping analysis (unpublished observations; Damerla et al. 2013).

While our genome scan was conducted using 123 polymorphic DNA markers, the number of markers can be increased or decreased to accommodate the mapping resolution desired. The bioinformatics pipeline we developed for analysis of the next generation sequencing data allowed for rapid automated genotype calling that streamlined the mapping analysis. It should be noted while there are commercial mouse SNP genotyping arrays, these are only available for more commonly used inbred mouse strains, none for the closely related B6/B10 inbred mice used in our studies. This points to the utility of amplicon sequencing for conducting custom genotyping analysis. Although our study was focused on mapping mutation in the mouse genome, this same approach can be applied for genome scanning and genotyping analysis in any organism, including human clinical studies.

The emergence of bench top sequencing machines such as the PGM for next generation sequencing (Chan et al. 2012) has made it possible to scale sequencing projects to accommodate higher sample throughput while decreasing cost. Our studies showed the efficacy and scalability of amplicon sequencing for conducting genome scans and genotyping analysis. We showed the cost for amplicon sequencing can be significantly reduced by multiplexing the PCR amplifications and also using sample bar coding to reduce the time and cost for library construction. At present, up to 128 unique bar codes are available, and when combined with the use of the higher capacity 316 chip, this allows significant scale up of the sequencing run on the PGM. The addition of more bar codes is expected in the near future, which will further facilitate scale up of amplicon sequencing.

Previous reports, have shown amplicon resequencing, can be used to identify novel mutations with targeted resequencing of selected genes (Daum et al. 2012; Otto et al. 2011). These studies also reported significant savings in costs and time in utilizing next-generation sequencing platforms over Sanger sequencing for targeted resequencing analysis. One previous concern in using next generation sequencing for resequencing analysis is its lower sequencing accuracy. This concern is abated with the much higher sequencing depth afforded by the current next generation sequencing platforms. Using data generated in this study, we found sequencing depth with 100× coverage is sufficient to provide 100 % accuracy in genotyping calls.

Our findings show that amplicon sequencing using the PGM is cost effective for high throughput genome scan mapping analysis and is customizable and easily adapted for small- or large-scale studies. The flexibility of amplicon sequencing will make it possible to undertake custom genotyping analysis, whether in studies involving model organisms or clinical studies involving unique patient population. Overall, this approach has general applicability for a wide variety of large-scale genotyping analyses and can be employed clinically, such as in HLA genotyping (Wang et al. 2012), pharmacogenomics studies, or other custom genotyping analysis such as those required for the clinical practice of personalized medicine.

Methods

Institutional approval for animal studies

All mouse experiments were carried out using protocols approved by the Institutional Animal Care and Use Committee of the University of Pittsburgh.

DNA samples and PCR amplification

The Bishu mutant analyzed in this study was recovered from a mouse ENU mutagenesis screen. The mutagenesis and breeding of mice were carried out as previously described (Yu et al. 2004). Skin tissue was collected from seven Bishu mutants and genomic DNA was extracted for the mapping analysis using 123 SNPs. Each SNP was amplified in 50 μl PCR reaction containing 10×Amplitaq Gold buffer, MgCl2, 0.15 mM dNTP mix, 0.1 μM each of forward and reverse primer, 1 unit Amplitaq Gold polymerase, and 50 ng genomic DNA. PCR amplification was performed at DNA denaturation at 95 °C for 5 min, 40 cycles of 95 °C for 30 s, 55 °C for 30, 72 °C for 1 min, and finally 5 min at 72 °C. Multiplex PCR involved amplification of all the SNPs simultaneously in a single PCR reaction with all the primers mixed into the same PCR reaction mixture. All PCRs were performed in either DNA Engine Tetrad® 2 (Bio-Rad) or Mastercycler® nexus (Eppendorf) thermal cyclers.

Library preparation and ion Torrent amplicon sequencing

The libraries were generated using the Ion Plus Fragment Library Kit (Cat. No. 4471252, Life Technologies). Amplicons generated by PCRs from each mutant were mixed in equal volumes. A total volume of 500 μl of mixed amplicons was used for a single purification reaction using Agencourt® AMPure® SP Reagent (Beckman Coulter) at 1:1.8 ratio of DNA to beads in a 2 ml eppendorf tube. Each tube was placed on a magnetic rack for 2 min followed by two washes with 70 % ethanol. After ethanol was removed, and the tube was air dried, the beads were resuspended in 50 μl low TE buffer. 1 μl of the pooled amplicons were analyzed on an Agilent® Bioanalyzer® using the Agilent® High Sensitivity DNA Kit and Bioanalyzer® software to determine the concentration of the amplicon pools. 100 ng of pooled amplicons from each sample was end-repaired using 20 μl end repair buffer, and 1 μl end repair enzyme supplied in Ion Plus Fragment Library Kit and incubated at room temperature for 20 min. Agencourt® AMPure® SP Reagent was used to purify the end-repaired pooled amplicons as described above.

Ligation of adapters and barcodes

Barcodes 1 through 8 from Ion Xpress 1-16 barcoding kit was used for each mutant, respectively, and combined with reagents supplied in the Ion Plus Fragment Library Kit were incubated in a thermal cycler at 25 °C for 15 min followed by 72 °C for 5 min. These barcoded pooled amplicons were further purified using the Agencourt® AMPure® SP Reagent.

Library quantification and template preparation

The library was quantified by qPCR using the Ion Library Quantitation Kit (Cat. No. 4468802) to determine a suitable template dilution factor so as to be clonally amplified in ion sphere particles. Template dilution factors for each of the barcoded libraries were established according to the protocol in Ion Library Quantitation Kit (Cat. No. 4468802). In brief, 1:200 and 1:2,000 dilutions of each barcoded library were analyzed by qPCR by validating them on a standard curve generated by a prequantified standard E. coli library. qPCR reactions were carried out in 7,900 HT System (Applied Biosystems). Template dilution factors were calculated, and the barcoded libraries were combined in equimolar ratios for template preparation. We used the automated procedure for template preparation using the Ion One Touch System and the Ion One Touch 200 template kit using the manufacturer’s instructions.

Ion Torrent PGM sequencing and DNA sequence data analysis

Templates prepared from pooling all the barcoded libraries were sequenced on a 314 chip using the Ion Torrent PGM and the Ion PGM™ Sequencing Kit according to the manufacturer’s instructions. Ion Torrent reads were aligned to amplicon reference sequences using CLCBio Genomic Workbench software. Genotype calling for all the marker positions were automated using samtools mpileup (http://samtools.sourceforge.net) together with custom scripts. Positions with insufficient coverage (less than 20×) are denoted as “genotype unknown”.