Introduction

The benefit of using closely linked markers in forensics requires that these markers are jointly analysed as haplotypes. Early approaches to haplotyping were taken when haemogenetical kinship testing was performed using blood group markers such as Rh and MNS. The Rh region, for example, contains three diallelic loci (C/c, D/d and E/e), which have rather low discriminating power on their own, but the eight haplotypes of the combined marker set provide a powerful blood group system [15].

Whilst the discovery of classical linked markers such as CDE and MNSs was mostly coincidental, progress in molecular science now facilitates the systematic characterisation of useful linkage groups. Recently, we introduced linkage group DXS101–DXS7424 on Xq22 as a valuable means of forensic testing [5]. In this work we describe another X-linked marker cluster that can be employed to assess comparatively distant blood relationships. DXS6801, DXS6809 and DXS6789 are located in a 3-Mb region on Xq21, spanning approximately 3–6 cM. In the German population, the three markers exhibit eight, 13 and 11 alleles, respectively [6, 8, 11]. Theoretically, the cluster could therefore give rise to 1,144 different haplotypes.

In males, simple genotyping of X-chromosomal short tandem repeats (STRs) reveals single alleles and haplotypes at the same time. In females, haplotypes can also be determined if the biological father or at least one son is available for testing. This ease of haplotyping implies that whole X chromosomes can often be traced through large pedigrees. Although marker transmission breaks down at father–son relationships, X-chromosomal haplotyping can nevertheless assist in solving difficult cases of disputed kinship [17]. However, the proper use of haplotypes requires knowledge about genetic linkage, haplotype frequencies and linkage disequilibrium (LD) in the appropriate population. Here, we present haplotype data of the three X-chromosomal STRs in question, as obtained in a German sample. We will also illustrate the power of X-chromosomal haplotype analysis using two examples of complex pedigree-based kinship testing.

Materials and methods

Samples

Blood samples and buccal swabs were taken from 1,061 unrelated Germans (806 male, 255 female). Male samples mainly originated from routine paternity cases (n=604). DNA from 255 mothers and at least two sons each was genotyped to estimate the genetic distances between DXS6801, DXS6809 and DXS6789. Typing of this group provided information on both linkage and male haplotypes (n=202). Blood samples were predominantly taken from students and their families, and therefore had to be anonymised before processing.

DNA typing

DNA samples were extracted using the NucleoSpin blood kit and the NucleSpin tissue kit, respectively (Macherey and Nagel, Dueren, Germany). Triplex PCR was carried out to amplify dye-labelled STR products, using previously published primers [6, 8, 11]:

DXS6801

PF-Tamra-CATTTCCTCTAACTTGTCTCC

PR-CAGAGAGTCAGAATCAGTAG

DXS6809

PF-Fam-CTAGATTATGTAGGAATTTGG

PR-TCTGGAGAATCCAATTTTGC

DXS6789

PF-Hex-GTTGTTACTTAATAAACCCTCTTT

PR-AAGAAGTTATTTGATGTCCTATTGT

Primer concentrations were 0.3 μM for DXS6801, 0.09 μM for DXS6809 and 0.06 μM for DXS6789. PCR was carried out in 25 μl reaction volume containing 1–5 ng DNA, 100 μM of each dNTP, 5 μg BSA (Sigma 3350, München, Germany), 2.0 mM MgCl2, 1.5 U AmpliTaq-Gold, and 2.5 μl AmpliTaqGold buffer II (ABI, Foster City, CA). The PCR protocol comprised initial denaturation at 95°C for 11 min, 93°C for 60 s, 56°C for 60 s, 72°C for 60 s over 30 cycles. PCR products were analysed by capillary electrophoresis using ABI Prism 310 and 3,100 sequencers (polymer POP 4). Allele calling was carried out upon the basis of sequenced alleles.

Linkage and haplotype analysis

DNA from buccal swabs was genotyped for 255 mother–offspring constellations with at least two sons involved. Due to the hemizygosity of gonosomes in males, X-chromosomal haplotypes could be identified directly by locus-wise STR typing. Crossing-overs were counted and genetic distances (in centiMorgan, cM) calculated using Kosambi’s mapping function. Haplotype frequencies were estimated from 202 sons and another 604 unrelated males, all genotyped for the three STRs in question. Linkage disequilibrium (LD) between pairs of alleles of different markers was assessed for statistical significance by means of a chi-square test with two-sided Monte Carlo p values, calculated using StatXact-4.

Creation of a genetic map of the human X chromosome

A total of 18 X-chromosomal markers, including DXS6801, DXS6809 and DXS6789, were used for complex kinship testing (Table 1). The genetic localisation of nine of these markers was included in, and obtained from, the Marshfield database (http://www.marshfieldclinic.org/research/genetics/) whilst the physical location of all but two markers (DXS9895 and GATA172D05) could be retrieved from the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/). In order to generate a coherent genetic map for the purpose of likelihood calculations, we performed non-linear regression of the Marshfield coordinates (dependent variable y, in cM) and the NCBI coordinates (independent variable x, in Mb) for those eight marker for which both localisations were known. Fitting a two-parameter modified power function, using SigmaPlot v8.03 (SPSS Inc. 2001), gave the best fit for y=32.68(x0.23−1), with r2=0.9978. This function was then used to extrapolate the physical location of markers DXS9895 and GATA172D05 from their respective Marshfield coordinates. Finally, pair-wise inter-marker recombination fractions θ were obtained from physical distances using Kosambi’s mapping function and assuming that 1 Mb of physical distance corresponds, on average, to 1 cM of genetic distance (Table 1).

Table 1 Complex kinship analysis of case 1, using 18 X-chromosomal DNA markers

Complex kinship testing

In the following, all family members have been re-named in order to protect their privacy.

Case 1

Two sisters (“Pia” and “Lea”) and their putative aunt (“Eva”) asked for kinship testing since they had good reason to assume that the sisters’ maternal grandfather was not Eva’s biological father (Fig. 1). This case was initially analysed using 19 autosomal markers: D3S1358, vWA, D16S539, D2S1338, D8S1179, ACTBP2, D19S433, TH01, FIBRA, D21S11, D18S51, Penta E, D5S818, D13S317, D7S820, D16S539, CSF1PO, Penta D, and TPOX [1, 9]. At the same time, we initiated X-chromosomal testing and investigated the Xq21 cluster described above together with 15 additional X-chromosomal markers reported earlier [24, 6, 7, 10, 12, 16]. In addition, we decided to include Eva’s son “Tom” into our investigations.

Fig. 1
figure 1

Complex kinship test (case 1) involving X-chromosomal STR haplotyping. Unfortunately, Pia and Lea exhibit exactly the same ChrX STR genotype so that the maternal and paternal haplotypes cannot be distinguished. However, it is highly likely that Eva and both Pia and Lea share a haplotype spanning the non-shaded region

Likelihood calculations in case 1

Pedigree analysis program MLINK [13] was used to calculate the likelihood of the genotypes of the two sisters and their presumed aunt and cousin, assuming that “Eva” and the sisters’ untyped mother were either paternal half-sibs (hypothesis H0) or unrelated (H1). Likelihood calculations were based upon published STR allele frequencies and the pair-wise recombination fractions given in Table 1. Since the number of possible haplotypes segregating in the family was too large to perform exact likelihood calculations, likelihoods were instead approximated assuming that the sets of genotypes of all family members combined formed a first-order Markov chain over markers (or clusters of markers) that were not in LD. In other words, if G1,..., G n denote the sets of genotypes of n markers, and if none of the first j markers is in LD with any of the remaining nj markers, then

$$P{\left( {G_{1} , \ldots ,G_{n} } \right)} = \frac{{P{\left( {G_{1} , \ldots ,G_{j} } \right)}P{\left( {G_{j} , \ldots ,G_{n} } \right)}}}{{P{\left( {G_{j} } \right)}}} = \frac{{P{\left( {G_{1} , \ldots ,G_{{j + 1}} } \right)}P{\left( {G_{{j + 1}} , \ldots ,G{}_{n}} \right)}}}{{P{\left( {G_{{j + 1}} } \right)}}}$$
(1)

was assumed to hold for the probability P(G1,..., G n ) of observing the respective genotype sets. Under this assumption, the overall log-likelihood ratio, log10(LR), of H0 versus H1 equals the sum of the log10(LR) values of individual marker clusters, overlapping by exactly one marker, minus the sum of the log10(LR) values obtained for these “bridge” markers (labelled log10(LRb) in Table 1).

Case 2

Whilst no DNA samples could be obtained from the putative father (PF) or putative grandparents of disputed child “Nora”, two brothers (“Jim” and “Joe”) of the PF were available for testing (Figs. 2, 3). Stage 1 of the paternity test was carried out with the same routinely used 19 autosomal markers as in case 1. In stage 2, we tested the Xq21 cluster plus some additional STRs described earlier [24, 6, 7, 10, 12, 16]. Since X-chromosomal genotyping revealed an unambiguous exclusion of paternity for Jim’s and Joe’s brother, no likelihood calculations were performed.

Fig. 2
figure 2

Paternity exclusion (case 2) by means of X-chromosomal STR haplotyping. In two regions, Jim and Joe exhibit the same maternal haplotype. Markers localised there are considered uninformative since they reveal only one maternal allele. Uninformative markers are shaded in grey

Fig. 3
figure 3

Forensic markers on the X-chromosome. Localisation of, and distances between, markers are based upon NCBI data

Results

Amplification reliability

The triplex PCR protocol for the Xq21 STR cluster gave reliable results when 1–5 ng DNA was used as template. The product size ranges of the STRs did not overlap. Therefore, the dye combination can potentially be re-designed for more complex investigations so as to allow the inclusion of additional markers in multiplex fashion.

STR allele frequencies

Allele frequencies in the German population of markers DXS6801, DXS6809 and DXS6789 have been published before [6, 8, 11]. The present study nearly doubled the sample size upon which these estimates were based, but no notable differences were observed between our initial and the expanded sample. Therefore, no single STR data will be presented here.

Haplotype frequencies

Markers DXS6801, DXS6809 and DXS6789 exhibit eight, 13 and 11 alleles, respectively, in the German population [35]. This cluster could theoretically give rise to 1,144 different haplotypes. In fact, testing of 806 males revealed the presence of 207 different haplotypes. The frequency of the most abundant haplotype (11–33–20) still does not exceed 5%, and more than 89% of haplotypes have a frequency <1%.

Owing to limited space, Table 2 only lists haplotypes which occur with frequencies >0.01. The complete haplotype distribution of our sample will be published as Electronic Supplementary Material.

Table 2 Haplotype analysis in 806 German males of the DXS6801–DXS6809–DXS6789 cluster on Xq21. Only haplotypes with MLE >0.01 are included

Linkage analysis

A total of 255 mothers with at least two sons were genotyped to assess the pair-wise inter-marker recombination fractions. Since the degree of polymorphism of DXS6801, DXS6809 and DXS6789 differs considerably, with heterozygosity values of 0.626, 0.808 and 0.742, respectively, the number of informative meiosis was also found to vary between loci. On average, genetic distances as estimated from our data were slightly higher than the values suggested by the NCBI and Marshfield databases (Table 3). However, in view of the known difficulties to quantify linkage between very closely linked loci [18], our figures do not appear to differ significantly from the database values.

Table 3 Linkage analysis, physical and genetic maps of markers at Xq21.1 and Xq22

Linkage disequilibrium

Owing to their hemizygosity in males, linkage disequilibrium (LD) can be studied efficiently for X-chromosomal markers (Table 4). Test of pair-wise association revealed significant LD between DXS6801 and DXS6809 (two-sided Monte Carlo p=0.0003), DXS6801 and DXS6789 (p=0.0001) and between DXS6809 and DXS6789 (p<0.0001).

Table 4 Linkage disequilibrium between Xq21 markers DXS6801, DXS6809 and DXS6809 in 806 German males

Examples of complex kinship testing using X-chromosomal STRs

Case 1

Assuming equal prior odds, kinship testing with 19 autosomal markers yielded a probability of 88.4% for an aunt–niece relationship (“null hypothesis”, H0) in the family depicted in Fig. 1 (details about autosomal STR typing not shown). Whilst such a value does not generally provide sufficient certainty to probands, X-chromosomal genotyping led to an unambiguous result. Sisters “Pia” and “Lea” were found to share at least one allele for all X-chromosomal markers tested, which confirms full sisterhood and allows discrimination between the paternally and maternally inherited haplotypes at some loci. Typing of alleged aunt “Eva” and her son “Tom” clarified the X-chromosomal haplotypes of Eva. Obviously, Lea and Eva shared an identical haplotype over a wide range of markers, including the Xq21 cluster. A log10-likelihood ratio of 4.562439−1.442701=3.119738 was obtained in favour of H0, which implies that, assuming equal prior odds, the probability of Eva and Tom being aunt and cousin, respectively, of Pia and Lea equals 1,317.4617/(1+1,317.4617)=0.9992415 (or 99.92%).

Case 2

Of the 19 routine autosomal markers, also used in case 1, only ACTBP2 gave an exclusion of paternity in case 2 (Fig. 2). Brothers “Jim” and “Joe” of the alleged father carried four different ACTBP2 alleles and none of these matched the obligate paternal allele of disputed child “Nora” (data not shown). However, since such a pattern is also explicable by non-paternity in the grandparental generation, X-chromosomal testing was deemed more reliable.

If two brothers of a putative father (PF) carry different alleles at an X-chromosomal marker, this constellation is informative in the sense that it narrows down the set of possible alleles of the PF. In the present case, single STR typing revealed such a pattern and a consequent exclusion of paternity for DXS8378 and DXS10011. However, since DXS10011 is strongly prone to mutation, the exclusion was still regarded as weak. Anyhow, whilst single STR results for the Xp21 cluster were uninformative, haplotyping showed that the necessarily paternal allele combination 14–31–14 could have hardly be inherited from the PF. For the PF to have received this haplotype from the putative grandmother (PGM), two recombinations would have been required. Since the probability of such a double recombination is only 0.051×0.046=2.35×10−3, i.e. of the same order of magnitude as the mutation rate of most STRs, the PF could unequivocally be excluded from paternity.

Discussion

We have previously shown [5] for closely linked markers DXS7424 and DXS101 that X-chromosomal haplotyping can considerably facilitate complex kinship testing. However, since the segregation of chromosomal segments in a family occurs at random, even a panel of well-established markers may well be completely uninformative in a given case. An example of this was provided by the Xp22–Xp21 and Xq26–Xq28 markers in our case 1. The central region of the X chromosome, in contrast, comprising markers DXS7132 to DXS101, revealed sufficient proof of the presumed aunt–niece relationship. Case 2 further highlights the benefits of haplotyping in deficiency cases. Whilst single STR testing at Xp21 was uninformative, haplotyping led to an unequivocal exclusion of paternity.

The two cases re-emphasise the fact that the chance of success in X-chromosomal kinship testing critically depends upon the availability of a dense range of markers and marker clusters. Correspondingly, our current efforts to evaluate the Xp21 cluster were aimed at improving the density of X-chromosomal markers, readily formatted for use in forensic practise. Furthermore, our finding that 89% of haplotypes of this cluster occur at a frequency of less than 1% in the German population illustrates that our efforts have been worthwhile, and that the cluster provides a powerful new tool for kinship testing.

Before X-chromosomal STRs can be routinely used in forensic practice, it is necessary to investigate, and confirm, the inter-marker recombination rates expected on the basis of existing genetic or physical maps. Linkage between markers DXS6801, DXS6809 and DXS6789, as observed in the present study, appears to be wider than in the Marshfield and NCBI databases. However, taking the width of the associated confidence intervals into account, our results do not differ significantly from that of other groups. In a particular testing situation where Xq21 haplotyping leads to an exclusion of paternity (see, for example, case 2 above), we nevertheless suggest to be conservative and assume the distance between DXS6801 and DXS6789 to be 6 cM, rather than 3 cM.

We observed strong LD between DXS6801, DXS6809 and DXS6789. For the time being, this result should be regarded as limited to our sample from North-east Germany. It cannot necessarily be generalised to other populations. That LD may be population-specific has been shown before for DXS7424 and DXS101. Whereas LD was observed between the two markers in a German sample [5], Lee and colleagues [14] failed to observe LD in the Korean population. In any event, when LD exists, haplotype frequencies cannot be calculated from allele frequencies but have to be estimated directly from appropriate population samples. Since the corresponding databases need to be large for the estimates to be sufficiently accurate, a wider practical application of newly characterised X-chromosomal makers may be problematic. However, nationwide collaborations and the exchange of data between forensic genetics laboratories should eventually allow researchers to overcome this problem.