Introduction

Museum specimens are valuable sources of DNA whenever sampling of fresh tissue is not possible (e.g., Culver et al. 2000; Payne and Sorenson 2002) and for comparisons of current and historical genetic variation (Vallianatos et al. 2002; Godoy et al. 2004; Johnson et al. 2004; Muñoz-Fuentes et al. 2005). Low concentration and quality of DNA, however, can make the genotyping of such samples difficult (Glenn et al. 1999; Sefc et al. 2003). Numerous notes of caution and suggestions for detecting and avoiding errors in microsatellite genotyping have been published (e.g. Taberlet et al. 1996; Gagneux et al. 1997; Mills et al. 2000; Miller et al. 2002; Bonin et al. 2004; Kalinowski et al. 2006). In contrast, population studies based on DNA sequences from museum specimens of recent vintage and non-invasive samples have rarely investigated the issue of errors, despite a significant literature on sequence accuracy associated with the analysis of “ancient DNA” (e.g., Cooper and Poinar 2000; Hansen et al. 2001; Hofreiter et al. 2001; Pääbo et al. 2004). Although mitochondrial DNA is more easily amplified from suboptimal DNA extracts than are nuclear genes (Cooper 1994), damaged template may cause incorrect bases to be incorporated in the PCR product. Such artifact substitutions have been observed in PCR amplifications of DNA from ancient samples (Pääbo et al. 2004) and formalin-fixed tissues (Williams et al. 1999; Quach et al. 2004; Akbari et al. 2005). The most frequent result is apparent C→T substitutions following deamination of cytosine (Hofreiter et al. 2001).

Here, we report on the frequency of PCR errors encountered in sequencing mtDNA from relatively recent avian museum specimens, and the resulting overestimation of genetic diversity in the historical sample. The sequencing artifacts were discovered when a population genetic study of brood parasitic indigobirds (Vidua spp., Viduidae) was supplemented with DNA extracts from feathers of 28–35 year old bird skins (Sefc et al. 2005). Although our study investigated indigobird speciation, our findings are relevant to conservation genetic studies employing historical samples for monitoring temporal changes in genetic diversity and demography.

Material and methods

As part of a population genetic study of indigobirds (Vidua spp.), we amplified and sequenced mitochondrial DNA from 28–35 year old museum specimens (specimens collected from 1966 to 1980; DNA extracted in 2001; n = 219). Birds were collected by shotgun (1966–1968) or were netted and then sacrificed using cardiopulmonary compression (1972–1973). Skins were prepared as museum specimens the same day and left to dry at ambient temperature and humidity. The specimens were stored in steel cabinets at the University of Michigan, Museum of Zoology, except for short periods when skins were handled once or twice a year. Apart from being heated to 18°C in winter, temperature and humidity in the museum varied with ambient conditions, ranging from cool and dry in winter to warm and humid in summer. DNA was extracted from the calamus of feathers plucked from the inner wing, so this tissue had no contact with other specimens and little or no exposure to UV light.

For museum samples, all pre-amplification steps were carried out in a separate room with dedicated equipment that has never been used for fresh tissue samples or PCR products. DNA was extracted from the calamus of one or two feathers (98 and 121 samples, respectively) with a QIAamp Tissue Kit (Qiagen, Valencia, California) supplemented with 3 mg dithiothreitol (DTT) for digestion of feather keratin, and eluted in a final volume of 200 μl. PCR amplification of 1,100 base pairs (most of the NADH dehydrogenase subunit 6 (ND6) gene, tRNA glutamine and the 5′ half of the control region) was achieved in three overlapping fragments of 448, 321, and 534 bp (see Sorenson and Payne 2001 for primer sequences). Forty-five PCR cycles were carried out in volumes of 50 μl using 1.25 U AmpliTaq Gold DNA Polymerase (Perkin Elmer, Boston, Massachusetts). Negative PCR controls were run with each batch of reactions. PCR was successful on the first attempt in > 92% of samples, but up to four PCR reactions were attempted before products were obtained for some of the samples.

PCR products were excised from agarose gels and purified with a Gel Extraction Kit (Qiagen). DNA sequences were obtained using a BigDye Terminator Cycle Sequencing Kit (Applied Biosystems) and an Applied Biosystems 377 DNA sequencer. Sequences were checked and assembled in Sequence Navigator (Applied Biosystems), and analysed in PAUP* (Swofford 2002). Both DNA strands were sequenced; the double-peaks and artifact substitutions described below were always observed in both strands of a given PCR product. Sequence data in GenBank (AF090341; AY322613-AY322833; AY865372-AY865554) have been updated with corrections based on the results reported here.

We also obtained sequences from 297 recently collected tissue samples, amplifying the same mtDNA region in two overlapping fragments (Sorenson and Payne 2001; Sorenson et al. 2003). The geographic distribution of museum specimens and fresh tissue samples was similar and comprised the same sets of indigobird species (see Sorenson et al. 2003; Sefc et al. 2005 for details). Separate analyses were completed for West Africa and southern Africa, respectively, reflecting a geographic split in mtDNA haplotypes and limited genetic differentiation among the species within each region (Sorenson et al. 2003; Sefc et al. 2005). “Southern indigobirds” from South Africa, Zimbabwe, Zambia, Malawi, and Botswana includes museum samples collected in 1966/67 (n = 123), and 1973 (n = 6), and fresh tissue samples (n = 103). “West African indigobirds” from Cameroon, Nigeria, Ghana, Gambia, Mali, and Senegal includes museum specimens collected in 1904 (n = 1), 1968 (n = 39), 1975 (n = 23), 1979/80 (n = 27), and fresh tissue samples (n = 194). Haplotype and nucleotide diversity within individual populations is generally high (H e = 0.88±0.12; π = 0.0032±0.0014; Sefc et al. 2005).

Indices of genetic diversity were calculated in DnaSP version 4.00 (Rozas et al. 2003). Effective population sizes of historical and modern samples were estimated in a Bayesian framework employing a model of exponential growth or decline during the sampling interval, as implemented in TMVP2P (Beaumont 2003) with the following settings: 500,000 MCMC updates; maximum population size = 20,000; sampling interval = 20 generations; size of importance sample = 200; thinning interval = 20; size of the proposal distribution of parameter updates = 0.4. Convergence was checked by two replicate runs for each dataset.

Results and discussion

Three kinds of evidence led to the conclusion that PCR errors affected the sequences we obtained from museum specimens. First, a total of 64 double peaks were observed in the sequence electropherograms for 34 of 219 museum specimens (Table 1). In contrast, no double peaks were observed in sequences obtained from fresh tissue samples. In all cases, double-peaks comprised overlaid signals of either the two pyrimidines or the two purines; and most (53 of 64) occurred at positions that were otherwise invariant among > 500 indigobirds (including both museum specimens and recent samples). Fifty-one C + T double peaks occurred at positions with conserved C’s in the mitochondrial light strand, whereas two A + G double peaks occurred at positions with a conserved A and G, respectively, in other indigobirds. The remaining cases included nine additional C + T double peaks and two A + G double-peaks at positions that were polymorphic among other indigobirds. Replicate PCR and sequencing reactions for a subset of the affected samples (n = 23 double peaks in 14 samples; including all 11 cases of double peaks at polymorphic positions) resolved 21 previously ambiguous C + T double peaks in favor of C and two A + G double peaks in favor of A. We therefore scored C + T double peaks in the remaining samples as C, provided that C was present in all other samples at the respective position.

Table 1 Summary of apparent PCR errors in relation to (a) substitution type, (b) fragment length, and (c) extract concentration

Second, the above results led us to scrutinize our data for PCR errors that were not evident as double peaks, but resulted in unambiguous, albeit erroneous, base substitutions. In a haplotype network for 233 southern African indigobirds (130 museum specimens and 103 fresh tissue samples), autapomorphic substitutions leading to unique tip haplotypes were more frequent in sequences from museum specimens and the majority of these were C→T (n = 22 of 34) or G→A transitions (5), whereas only 3 were T→C (2) or A→G (1). In contrast, autapomorphic transitions in sequences from fresh tissue samples were fewer and if anything biased in the opposite direction (0 C→T; 3 T→C; 2 G→A; 5 A→G). Likewise, no substitution bias was found on internal branches radiating out from two common and presumably ancestral haplotypes (6 C→T; 3 T→C; 4 G→A; 6 A→G). We repeated PCR and sequencing for all museum samples that displayed autapomorphic substitutions (n = 54 substitutions in 40 samples), and found that 21 of 31 C→T, 2 of 9 G→A, 2 of 2 G→T, and 1 of 2 C→A substitutions in the original light strand sequences were not reproducible. One sample with a C→T transition had a C + T double peak at the same position in the replicate sequence. In addition, two novel and apparently erroneous C→T transitions appeared at other positions in the replicate sequences. As one would expect, genuine autapomorphies occurred mainly at third codon positions of the 477 bp within the ND6 gene (n = 5, 4, and 29 autapomorphies at codon positions 1, 2, and 3, respectively), whereas artifacts appeared to be randomly distributed (n = 4, 4, and 2 autapomorphies at codon positions 1, 2, and 3, respectively; G adj = 9.81, df = 2, P < 0.01 using William’s correction for small sample size).

Third, the frequencies of both double-peaks and erroneous substitutions increased with length of the region amplified and were higher in sequences derived from one-feather extracts as compared to two-feather extracts (Table 1), suggesting a higher error rate when longer fragments are amplified from lower concentration extracts. A correlation between DNA template concentration and PCR error rate was also observed in microsatellite genotyping of these same DNA samples (Sefc et al. 2003) and in sequencing of formalin-fixed tissue extracts in other studies (Williams et al. 1999; Akbari et al. 2005).

The distribution of irreproducible transitions and transversions in our data differs significantly (χ2 = 134.3, df = 3, P < 0.001) from the reported distribution of Taq polymerase errors (Hansen et al. 2001), suggesting that lesions in the template DNA are the source of most of the observed sequencing artifacts we observed. Both erroneous substitutions and double peaks likely result from the same types of DNA damage, the former being observed when amplification is initiated from a single damaged template strand, and the latter when amplification begins with one damaged and one or more undamaged strands. Artifact C→T changes in the L strand amplicon comprised ∼90% of the PCR errors observed in our data, and were most likely caused by deamination of cytosine to uracil or of 5-methyl-cytosine to thymine on the L strand (Hofreiter et al. 2001). Likewise, G→A changes in the L strand amplicon result from deamination on the H strand, but were much less frequent in our dataset (∼3%). This may reflect preferential amplification from the L strand template (Gilbert et al. 2003). Three G + A double peaks at positions with A in the L strand are ascribable to deamination of A to the guanine-analogue hypoxanthine, which occurs at 2–3% of the rate of cytosine deamination (Lindahl 1993). Finally, one C→A and two G→T transversions are perhaps explained by oxidation of guanine to 8-hydroxyguanine leading to the incorporation of A rather than C in the complementary strand (Lindahl 1993).

Overall error rates in our study were relatively low, involving ∼1 × 10−4 erroneous substitutions and ∼3 × 10−4 double-peaks per base-pair prior to replicate sequencing (total of 241 K base pairs from museum samples), but affected 45 of 219 individual samples (21%) with a mean of 2.1 ± 1.1 artifacts per affected sample. Without the replicate sequencing described above, these errors would appreciably increase the number of unique haplotypes (by 20% in our total sample of southern and West African museum specimens) but have less effect on other measures of genetic diversity (e.g., < 1% overestimation of haplotype and nucleotide diversity) due to the low frequency of the artifact haplotypes. Reanalysis of our data after removal of artifact transitions slightly increased population differentiation values (by < 5%; southern indigobirds: mean of species-pairwise ΦST values from uncorrected data, 0.049; from corrected data, mean ΦST = 0.051; West African indigobirds: uncorrected, mean ΦST = 0.0235; corrected, mean Φ ST  = 0.0236) but had no effect on the statistical significance of results reported by Sefc et al. (2005); if anything, the low rate of errors reduced our power to detect population differentiation and was therefore conservative in the context of our earlier study.

Overestimates of genetic diversity in historical samples might be of greater consequence in comparisons with contemporary populations, where the loss of rare alleles and reduced gene diversity provide evidence of population bottlenecks or declines (Luikart et al. 1998; Johnson et al. 2004), suggesting the need for replicate sequencing in such studies (e.g., MacHugh et al. 1999; Paxinos et al. 2002). In our experiments, most of the sequence artifacts occurred in the museum specimens of southern indigobirds (24 substitutions in 14 samples, and 56 double peaks in 28 samples), which may be due to the generally older collection dates of the southern museum samples. In the following, we investigate the effects of the observed artifact substitutions on estimates of population diversity and demography in the southern indigobird dataset (129 museum and 103 recent samples). The total number of historical haplotypes as well as the number of haplotypes unique to the historical population dropped considerably when sequencing errors were corrected. In contrast, haplotype and nucleotide diversity differed only slightly between the uncorrected and corrected museum sequences (Table 2). Clark and Whittam (1992) showed that sequencing errors affect diversity estimates most severely when the true level of diversity is low, which is often the case with taxa of conservation concern (e.g., Johnson and Dunn 2006; Lage and Kornfield 2006). Thus, while the effect of sequence errors on estimates of genetic diversity and differentiation were minimal for indigobirds, a similar error rate may produce greater discrepancies in analyses of populations with low genetic diversity. Even with our dataset, however, artifact substitutions in the museum samples led to misleading estimates of population size change, suggesting a 56% decline in effective population size between the museum sample and the fresh tissue sample. After correction of artifacts, the estimate of current population size increased almost threefold compared to the result obtained from the flawed dataset, and the decline between the museum and fresh samples was only by 20% (Table 2).

Table 2 Number of haplotypes, haplotype and nucleotide diversity, and effective population size in the museum samples of southern indigobirds before and after correction of the artifact substitutions, and in the fresh tissue samples. Estimates of effective population sizes were obtained by a combined analysis of historical and contemporary data assuming a model of exponential growth or decline during the sampling interval

Recently, considerable attention has been paid to the error-proneness of microsatellite genotypes from hair and fecal samples, whereas concerns about the accuracy of DNA sequencing have been raised mainly in relation to ancient or formalin-fixed tissues. Our data demonstrate that sequencing artifacts are not confined to ancient and highly degraded DNA sources. The quality and concentration of the DNA extracts investigated here was sufficient for microsatellite genotyping with manageable error rates (Sefc et al. 2003), which were comparable to those encountered with noninvasive samples (e.g. Buchan et al. 2005). Sequence data from noninvasive samples may therefore suffer from similar error rates as reported here for museum specimens.

We advocate heightened awareness of the potential for sequencing errors and greater efforts to verify sequence data in studies utilizing potentially problematic material, especially in population genetic studies in which inferences may depend on the presence and frequency distribution of rare haplotypes. Precautions that are applied prior to and during data collection and key criteria for authenticating ancient DNA data (e.g., Cooper and Poinar 2000) should be understood and followed where possible, although full implementation of these standards is probably overly demanding (and unnecessarily costly) for analyses of relatively recent DNA. Here we show that the accuracy of a large sequence dataset can be evaluated a posteriori by carefully examining sequence traces for double peaks and assessing the distribution of autapomorphies in relation to specimen age, substitution type, and affected codon position. Replicate PCR and sequencing can then be directed specifically to suspect individuals and data partitions or, if no suspicious data are detected, a random subset of sequences could be replicated to verify consistent results. As the frequency of single base errors is also related to extract concentration, the potential tradeoff between damage to the specimen and quality of the resulting sequence data should be considered in deciding the number and size of feathers or amount of skin tissue sampled from a museum specimen.