Introduction

Short-tandem repeat markers on the Y chromosome (Y-STR) have been very useful for forensic DNA analyses as well as studies of human migration and evolution [15]. As more Y-STRs with a high degree of polymorphism are added to the Y-chromosome-specific DNA profile, the potential to distinguish between paternal lineages will increase. However, highly polymorphic STR loci may have high mutation rates because there is natural correlation between the degree of polymorphism and the mutation rate of a given locus [6]. In paternity testing, spontaneous mutations in the germline of the putative father at any locus used in the analysis can lead to a false exclusion due to differences between father and son. Therefore, reliable estimates of mutation rates of Y-STRs are necessary for the accurate interpretation of Y-STR data in paternity testing and forensic casework [7, 8]. Moreover, knowledge of mutation rates is also essential for consistently dating the origin of Y-chromosomal lineages defined by single-nucleotide polymorphisms (SNP) in molecular-anthropology studies [911]. Until now, more than 200 Y-STR polymorphisms have been described, but studies on Y-STR mutation rates are still insufficient and have considered only a restricted number of markers [8, 10, 1221].

We determined the haplotypes and the overall mutation rates for the 22 Y-STRs, DYS19, DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS385, DYS388, DYS437, DYS438, DYS439, DYS446, DYS447, DYS448, DYS449, DYS456, DYS458, DYS464, DYS635 and GATA H4.1 in 369 father–son pairs from 355 Korean families.

Materials and methods

DNA samples

A sample of 724 males was selected from the Korean population, which comprised 355 unrelated individuals who had one or two respective sons. Genomic DNA was extracted from buccal swab samples using QIAamp DNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol.

Multiplex PCR amplification of 22 Y-STRs

A total of three multiplex PCR sets were constructed for 22 Y-STRs. Y-multiplex I consisted of the minimal haplotype STRs (DYS19, DYS389I/II, DYS390, DYS391, DYS392, DYS393 and DYS385). Y-multiplex II consisted of DYS385, DYS438, DYS439, DYS437, DYS448, DYS456, DYS458, DYS635 (GATA C4) and GATA H4.1. Y-multiplexes I and II implemented 17 Y-STRs from the AmpFlSTR® Yfiler™ kit (Applied Biosystems, Foster City, CA, USA) [22]. Y-multiplex III consisted of DYS385, DYS388, DYS446, DYS447, DYS449, and DYS464. In three multiplexes, DYS385 was commonly used to check for sample switching. PCR amplifications were carried out in a final volume of 10 μl containing 0.5–1.0 ng template DNA, 1.6 μl of Gold ST*R 10X buffer (Promega, Madison, WI, USA), 2.0 U of AmpliTaq Gold® DNA polymerase (Applied Biosystems) and the appropriate concentration of primers (Table 1). Thermal cycling was conducted on the GeneAmp® PCR System 9600 (Applied Biosystems) or PTC-200 DNA engine (MJ Research, Waltham, MA, USA) under the following conditions: 95°C for 11 min; 30 cycles of 94°C for 1 min, 55°C (Y-multiplex I) or 59°C (Y-multiplexes II and III) for 1 min, 72°C for 1 min, and a final extension at 60°C for 45 min.

Table 1 Primer sequences and concentrations of three multiplex PCR sets for 22 Y-STRs

Electrophoresis and genotyping

The PCR products were mixed with GeneScan-400 HD (ROX) size standard (Applied Biosystems) and separated by capillary electrophoresis using an ABI PRISM 310 Genetic Analyzer (Applied Biosystems) and GeneScan software 3.1 (Applied Biosystems). The typing of PCR products at each STR locus was performed by comparing them to an allelic ladder, which was constructed after confirming the sequences. Allele nomenclature followed the recommendations of the International Society of Forensic Genetics (ISFG) Commission [23], and allele designation was carried out using Genotyper 2.5 software (Applied Biosystems).

Identification of mutations

Mutations were identified by electrophoresis as allele length differences between father and son. Father–son pairs showing length mutations at Y-STR loci were analyzed using PowerPlex™16 (Promega) to confirm paternity. Mutations were confirmed by reanalysis and DNA sequence analysis. Before the sequencing reaction, samples carrying mutations were amplified in a monoplex reaction. PCR products were purified with ExoSAP-IT (USB, Cleveland, OH, USA), and sequencing analysis was performed on both DNA strands. For DYS385, DYS389I/II, DYS464, and DYS447 with allele duplication, each PCR product was cloned using pGEM®-T Easy Vector System I (Promega) following the manufacturer’s recommendations. Thereafter, sequencing analysis was performed on each cloned allele.

Statistical analysis

The relative frequencies of haplotype occurrences and haplotype diversities were calculated according to Nei [24] using the Arlequin statistical analysis package Version 2.000 [25]. Mutation rates were estimated as the number of mutations divided by the number of allele transmissions. Confidence intervals (CI) for mutation rates were estimated from the binominal standard deviation [26]. Comparison of the different mutation frequency estimates among populations was calculated using an exact test based on a Markov-chain procedure, as implemented in the Arlequin statistical analysis package Version 2.000 [25]. The paternity index was calculated according to Evett and Weir [27].

Results and discussion

Haplotypes for 22 Y-STRs in Koreans

The haplotype distribution in a sample of 355 unrelated Korean males for the 22 Y-chromosomal STRs, DYS19, DYS389I/II, DYS390, DYS391, DYS392, DYS393, DYS385, DYS388, DYS437, DYS438, DYS439, DYS446, DYS447, DYS448, DYS449, DYS456, DYS458, DYS464, DYS635, and GATA H4.1 is shown in the electronic supplementary material (Table S1). Allele calls for the multi-copy DYS464 were made based solely on the peaks that were present (conservative typing method), not on a combination of alleles and peak height ratios (expanded typing method) [28]. The number of different haplotypes, the number of unique haplotypes and haplotype diversity values are indicated in Table 2. Using 22 Y-STRs, a total of 350 haplotypes (98.6%) were obtained from 355 individuals and among these, 345 haplotypes (97.2%) were observed once and five haplotypes (2.8%) were observed twice. The sample yielded an overall haplotype diversity value of 0.9999.

Table 2 Haplotype diversity of 22 Y-STRs in 355 unrelated Koreans

For the minimal haplotype STRs (DYS19, DYS389I/II, DYS390, DYS391, DYS392, DYS393, and DYS385), 278 haplotypes (78.3%) were observed from the same population with the haplotype diversity value of 0.9972. From these, 239 haplotypes (67.3%) were observed once, 25 were observed twice (14.1%), six were observed three times (5.1%), three were observed four times (3.4%), two were observed six times (3.4%), and the other three haplotypes were observed seven, eight, and nine times, respectively (2.0, 2.3, and 2.5%, respectively).

For the extended haplotype STRs (minimal haplotype STRs, DYS438 and DYS439), 307 haplotypes (86.5%) were observed with the haplotype diversity of 0.9986. Among these, 278 haplotypes (78.3%) were observed once, 22 were observed twice (12.4%), two were observed three times (1.7%), one was observed four times (1.1%), two were observed five times (2.8%), and the other two were observed six and seven times, respectively (1.7 and 2.0%, respectively).

In the case of 17 Y-STRs (extended haplotype STRs, DYS437, DYS448, DYS456, DYS458, DYS635, and GATA H4.1) from AmpFlSTR® Yfiler™ kit, 340 haplotypes (95.8%) were observed with the haplotype diversity value of 0.9996. From these, 330 haplotypes (93.0%) were observed once, eight were observed twice (4.5%), and the other two haplotypes were observed four and five times, respectively (1.1 and 1.4%, respectively).

Mutation rate estimates for 22 Y-STR loci

A total of 9,226 allele transmission events from 369 father–son pairs were analyzed at 22 Y-STRs. In this survey, 36 mutations were observed: one at DYS389I, DYS390, DYS393, DYS437, DYS446, and GATA H4.1, two at DYS19, DYS385, DYS389II, DYS439, DYS447, and DYS456, three at DYS458 and DYS635, five at DYS464 and seven at DYS449 (Table 3). No mutation occurred in more than one locus for the same father-son pair. The locus-specific mutation-rate estimates were 0.0–19.0 × 10−3 per generation. The average mutation rate across minimal haplotype STRs was 2.7 × 10−3(95%CI 1.2–5.1 × 10−3), the average mutation rate for extended haplotype STRs was 2.7 × 10−3 (95% CI 1.4–4.9 × 10−3), the average mutation rate for 17 Y-STRs from the AmpFlSTR® Yfiler™ kit was 3.4 × 10−3 (95% CI 2.1–5.1 × 10−3) and the overall average mutation rate across all 22 Y-STRs was 3.9 × 10−3(95% CI 2.7–5.4 × 10−3).

Table 3 Sequence information for 35 mutations observed at 16 among 22 Y-STRs

The average overall mutation rates estimated for 9–17 STRs in various populations [8, 10, 12, 13, 1520, 29] ranged from 1.6 × 10−3 [18] to 4.3 × 10−3 [17]. Including our data, these values are not significantly different from each other (P > 0.05), except for that obtained from the present study and those by Budowle et al. [18] and Gusmão et al. [20] (P < 0.05). However, these various populations are all much the same in the average mutation rate for minimal haplotype STRs, most near ranging around 1.5–3.0 × 10−3 [8, 10, 12, 13, 15, 16, 19, 20, 25]. The present study and those by Budowle et al. [18] and Gusmão et al. [20] do not show a statistically significant difference for the average mutation rate across minimal haplotype STRs (P > 0.05). This demonstrates that differences in haplogroup-specific mutation rates are not significant [15].

Characteristics of Y-STR mutations

Through sequence analysis, all 36 mutations were found to have occurred inside the uninterrupted array of seven or more repeats that are completely homogeneous in size and sequence (Table 3). In most cases, mutations occurred at the most frequent alleles or alleles longer than the most common allele in the fathers’ samples. A comparison of the mean repeat length of non-mutated and mutated alleles (pre-mutation allele size) revealed that the mutated alleles were always longer or the same except for two at DYS449. Also, mutations, which were observed at compound STRs, always occurred in the longest array of homogeneous repeats.

In addition, all mutations were single-step, and this supports the generally accepted stepwise mutation model [29, 30]. However, a mutation at DYS464 (12-14-16 → 12-14-15-16) seems to be the result of simultaneous allele duplication and single-step mutation events (Fig. 1, Father 4–Son 4). In these cases, gain or loss of repeat could not be determined. At DYS447, it also could not be determined whether a mutation (23–24 → 24) was the result from simple one-step mutation event or the deletion of the father’s duplicated allele (Fig. 2). Accordingly, a 16:18 ratio of gains vs losses of STR repeats was observed, representing no bias. This result is different from those in a few previous studies [8, 10, 1221], which demonstrated a statistically significant surplus of gains.

Fig. 1
figure 1

Electropherograms for DYS464 in five father–son pairs with mutation events

Fig. 2
figure 2

Electropherogram for DYS447 in a father–son pair with a mutation at duplicated allele

Also, contrary to the results presented by Dupuy et al. [15], there was no clear pattern that an excess of repeat losses was observed for longer alleles. Gains were significantly more frequent than losses for longer alleles of DYS449 and DYS464. As pointed out by Gusmão et al. [20], this seems to be related to the notion that the direction of microsatellite mutations is biased towards microsatellite expansion until a certain repeat length at which the rates of expansion and contraction mutations are equal [31].

Table 4 shows the age distribution of the fathers involved in the mutation events and the age-group-specific mutation rates. For the 36 men with a mutation, the father’s age at the birth of the son was 25–40 years, with an average age of 31.3 years; for the fathers without a mutation, the average age at the birth of the son was 31.0 years. No distinction was observed between the ages of fathers with mutations and those of the whole sample (P > 0.05). This is consistent with the results obtained by Kayser et al. [8] and Dupuy et al. [15], but not with Gusmão et al. [20].

Table 4 Number of mutations in different age groups according to father’s age at son’s birth

Compilation of available mutation data on Y-STRs

The mutation rates were estimated for the 22 STRs obtained by pooling our data with the previously published Y-STR mutation data for 1–17 STRs [8, 10, 1218, 20, 21] (Table 5). The mutation rate estimates for DYS446, DYS447, and DYS449 were first reported in the present study. At 22 Y-STRs, a total of 184 mutations were observed in 77,831 allele transmissions, with a frequency of 2.36 × 10−3 (95% CI 2.03–2.73 × 10−3). Locus-specific mutation rates from the present study and previous reports are high in the order of DYS449 (18.97 × 10−3), DYS458 (8.38 × 10−3), DYS635 (5.66 × 10−3), DYS456 (5.59 × 10−3), and DYS439 (5.37 × 10−3) (Table 3). Although the mutation rate for DYS449 has never been provided, the above STRs must be the most prone to mutations in each repeat unit. As expected, almost all these loci have relatively high gene diversity values with long most common alleles. However, due to the allele frequency difference among populations, allele-specific or locus-specific mutation rates may vary to some extent. However, sample sizes per allelic class for each STR are not yet sufficient to allow an accurate estimation of allele-specific mutation rates.

Table 5 Comparison on Y-STR mutation rates between the present study and previous literature summary

Conclusion

This study presents the haplotype data and mutation rates for DYS388, DYS446, DYS447, DYS449, and DYS464 as well as 17 Y-STRs in the AmpFlSTR® Yfiler™ kit obtained from 369 father–son pairs of 355 Korean families. The compilation of Y-STR mutation events carried out for the present study as well as for previous studies demonstrates that the overall average mutation rate estimates are 2.36 × 10−3 (95% CI 2.03–2.73 × 10−3). However, to obtain intralocus mutation rate estimates and to increase the reliability of paternity testing, more family analyses involving more Y-STRs should be performed.