Introduction

Genetic profiling can be used to identify biological material found at a crime scene or to prove that no manipulation of biological samples occurred in clinical, toxicological or doping control tests [e.g. 1, 2]. Within this context, the strength of a match between the genetic profile of a questioned sample and the genetic profile of the corresponding person (the suspect) is generally evaluated with a likelihood ratio (LR) [3]. The higher the LR, the more support is given to the hypothesis that the sample originated from the suspect rather than from another person unrelated to the suspect.

LR values strongly depend on the genetic marker considered, and individual LRs obtained from independent nuclear DNA (nDNA) loci can be multiplied. Consequently, the use of short tandem repeat (STR) multiplexes that contain 10 (e.g. AmpFlSTR SGM Plus from Applied Biosystems, Rotkreuz, Switzerland) or 15 (e.g. PowerPlex 16 from Promega) unlinked nDNA markers generally allows high LRs to be obtained. The mitochondrial genome is maternally transmitted and lacks recombination [4]. The genetic information recovered from this haploid genome thus always corresponds to a single marker, whatever the number of nucleotides or regions sequenced. LRs for mitochondrial DNA (mtDNA) are therefore generally much lower than those for nDNA. Moreover, mtDNA markers allow identification of a maternal lineage, whereas nDNA markers enable individualization of a person. A major advantage of the mtDNA is that its number of copies in mammalian cells may exceed 1,000 times the number of nDNA copies [5]. MtDNA markers thus represent an alternative to nDNA markers in analysing biological samples poor in nucleated cells and/or stored in bad conditions [e.g. 6].

In some contentious situations, the identity of urine samples involved in doping control analysis has to be ascertained. Genetic typing of this kind of material is generally difficult because it may have been kept for several weeks or months at 4°C. Furthermore, urine from healthy individuals, especially males, contains very few nucleated cells [2, 7, 8] and shelter yeast cells and bacteria that can be detrimental to the amplification of human DNA [1, 7, 9]. Some urine components such as urea may further inhibit the PCR reaction [9, 10].

In this study, we wanted to evaluate the potential to genetically identify male urine samples stored under suboptimal conditions. For this purpose, 20 urine samples were kept for 6, 60 and 120 days at 4°C. DNA was extracted with both phenol–chloroform (P/C) and silica-based protocols and analysed with both nDNA (SGM Plus kit) and mtDNA markers (HVRI and HVRII). These data were used to evaluate the two DNA extraction protocols as well as to compare the identification potential of nDNA and mtDNA markers.

Materials and methods

Sample collection and DNA extraction

Urine (i.e. questioned samples) and blood (i.e. reference samples) samples were collected from 20 volunteer male donors of European Caucasian origin. Women were not considered in this study because female urine contains more DNA and is therefore easier to genotype [2, 7, 8]. Each urine sample was divided into 3×25-ml aliquots that were immediately stored at 4°C. DNA was extracted from these samples after 6, 60 and 120 days. The 25-ml aliquots corresponding to the first two storage periods were centrifuged for about 40 min at 4,000 rpm to obtain pellet volumes of around 250 μl. In order to maximize the recovery of cellular material, samples from the last period were concentrated with a Centricon Plus-20 (Millipore AG, Volketswil, Switzerland), following the manufacturer's instructions. The resulting pellets or concentrates were divided into two equal parts; one part was extracted with a P/C protocol [11] and the other one with the QIAamp viral RNA Mini Kit (Qiagen AG, Basel, Switzerland), following the manufacturer's recommendations. This latter kit co-extracts viral RNA and cellular DNA from body fluids and is suitable for the genotyping of urine samples [12]. Blood samples were stored at −20°C until they were extracted with the QIAamp DNA Mini Kit according to the manufacturer's instructions. All DNA extracts were concentrated by membrane-based size exclusion with Microcon 30 spin columns (Millipore) in order to get 20–25 μl final volume. Genomic DNA was quantified with a dot-blot method using the Quantiblot system from Applied Biosystems.

Nuclear STRs amplification and mtDNA sequencing

Amplification of nDNA was carried out with the AmpFLSTR SGM Plus Multiplex (Applied Biosystems), following the manufacturer's instructions, but in 25-μl reaction volumes. This kit enables ten STR loci (D3S1358, VWA, D16S539, D2S1338, D8S1179, D21S11, D18S51, D19S433, TH01, FGA) and the gender marker Amelogenin to be tested. In order to enhance the sensitivity of the amplification, extracts that contained less than 200 pg/μl DNA were amplified with 34 PCR cycles using a multi-tube approach [13]. The amplicons were run on an ABI 310 Genetic Analyzer and sized with the internal lane standard and allelic ladder using the Genescan and Genotyper softwares (Applied Biosystems). Profiles obtained with 34 PCR cycles were scored following the guidelines from [13].

The two hypervariable domains of the mitochondrial control regions HVI and HVII were amplified with the pair of primers L15896:CAAATGGGCCTGTCCTTGTA/H16414:TGTGCGGGATATTGATTTC (O. Froment, personal communication) and L00029:GGTCTATCACCCTATTAACCAC/H00408:CTGTTAAAAGTGCATACCGCCA [14], respectively. Amplifications were carried out in 25-μl reaction volumes with 1 μl template DNA, 0.5 μM of each primer, 2.5 μl 10× enhancer PCR buffer (Invitrogen AG, Basel, Switzerland), 2.5 μl PCR enhancer solution (Invitrogen), 0.2 mM of each dNTP (Takara; Axon Lab AG, Baden-Dättwil, Switzerland), 8 μg BSA, 2 mM MgSO4 (Takara) and 0.625 U Ex Taq DNA polymerase hot start (Takara). The PCR programme consisted of 30 s at 94°C, 30 s at 56°C and 60 s at 72°C over 36 cycles on a GeneAmp PCR System 9700 (Applied Biosystems). PCR products were purified with QIAquick PCR purification kits (Qiagen), following the manufacturer's instructions. Sequencing reactions were performed in 20-μl reaction volumes using the BigDye Terminator Cycle sequencing kit version 1.1 (Applied Biosystems). After being cleaned with Centri-sep spin columns, sequencing products were analysed on an ABI 310 Genetic Analyzer with the sequencing module Seq POP6 Rapid (1 ml), filter set E. Sequences were visualized and edited with the softwares Data Collection, Sequence Analysis, Sequence navigator and Factura from Applied Biosystems.

Statistical analysis

The two DNA extraction protocols were compared according to the concentrations of DNA recovered and to the numbers of SGM Plus loci that were successfully scored after amplification. The significance of these comparisons was evaluated with Wilcoxon's rank tests. An LR was used to quantify the strength of a match between a urine sample and the corresponding blood sample. LRs per locus were calculated using the formula of Balding and Nichols [15], using θ=0.01 to represent the level of co-ancestry within the Caucasian population [16]. Individual LRs were multiplied across loci. The LR for the mtDNA marker was estimated, taking into account the number of observations of the haplotype i in a database of n haplotypes corrected for sampling errors [15]. The nDNA allele frequencies were taken from [17, 18]. European Caucasian mtDNA sequences were extracted from [1923], as well as from a data base from our own laboratory. Information gathered at these two kinds of genetic markers was considered as independent. This implied that nDNA and mtDNA LRs could be multiplied to estimate overall LRs.

Results

DNA concentrations and DNA typing

The amounts of nDNA recovered with the two extraction protocols dramatically decreased over the 4-month period (Fig. 1). The detection threshold of the dot-blot method used to estimate DNA concentrations was around 30 pg/μl because the band of the standard corresponding to this concentration was always detected. Important quantitative differences were observed between the two extraction protocols. After 6 and 60 days, about four times more nDNA was recovered with P/C compared to the Qiagen kit (Fig. 1). After 120 days, variable amounts of DNA were detected in 50% of the samples extracted with the P/C protocol, whereas a complete absence of signal on the dot-blot characterized all samples extracted with the Qiagen kit (Fig. 1). The differences observed between the two treatments were highly significant for the three periods considered (P<0.005; Wilcoxon's rank tests). Comparable numbers of loci were obtained for urine samples with both extraction protocols (Fig. 2), and no significant differences were detected, whatever the period considered (P>0.050; Wilcoxon's rank tests). However, PCR conditions used to amplify the nDNA varied between treatments. Of the 60 extracts obtained with the P/C and Qiagen extraction procedures, 37 and 17%, respectively, contained at least 200 pg/μl DNA and were amplified with standard PCR conditions. Remaining samples were considered as low copy number (LCN) and amplified between three and six times with 34 PCR cycles. Despite the occurrence of allelic drop-out and false alleles associated with LCN analysis, all nDNA profiles obtained from urine samples perfectly matched those of reference samples.

Fig. 1
figure 1

Average DNA concentrations of 20 urine samples stored at 4°C for 6, 60 and 120 days. DNA was extracted with a P/C protocol and the QIAamp viral RNA Mini Kit (QIA). Vertical bars represent standard deviations. Differences were highly significant for the three periods (P<0.005; Wilcoxon's rank tests)

Fig. 2
figure 2

Same as Fig. 1, but for the mean numbers of reliable SGM Plus loci (without Amelogenin). None of the three comparisons was significant (P>0.050; Wilcoxon's rank tests)

Selective sequencing of samples for which less than four nDNA loci could be scored were conducted. Sequences of the HVI and HVII regions were obtained for 100% of the 17 samples matching this criterion, whatever the extraction protocol used and the period considered. The 17 urine mtDNA haplotypes were identical to those of the 12 corresponding reference samples (Table 1).

Table 1 Numbers of loci successfully scored (loci) and nDNA LRs for urine samples (U1–20) analysed after the three storage periods

Sample identification

Likelihood ratios were estimated only for samples treated with the P/C extraction protocol because it significantly improved the recovery of DNA from urine samples (Fig. 1). Overall, nDNA LRs greater than 1,000,000 characterized nDNA profiles having between five and ten loci (Table 1). Values comprised between 6,073 and 47,390 were associated with profiles having three to four loci. Finally, nDNA LRs from 37 to 278 were calculated for profiles with one locus and two loci. With only one locus, an nDNA LR of 160 was obtained for the sample U3 after 4 months (Table 1). This sample was homozygous for the allele 16 at the locus D19S433. This rare allele has a frequency of 0.056 in the Swiss reference population. Conversely, two loci were scored for the sample U16 after 4 months, but the nDNA LR was only 61. Indeed, the genotype of this sample was 13–14 at the D19S433 locus and 9–9.3 at the THO1 locus. This genotype is common because the frequencies of the corresponding alleles are all between 0.144 and 0.350.

The 12 mtDNA LRs estimated for the 17 samples with less than four nDNA loci are given in Table 1. They varied from 43, for the most common haplotype, to 435, for haplotypes that were not observed in the available database of 867 European Caucasian sequences. When the genetic information from nDNA and mtDNA markers was combined, overall LRs were between 290 (sample U19 after 120 days) and more than two million (samples U3 after 6 days and U10 after 120 days).

Discussion

Many substances such as illicit drugs or doping products are excreted by the kidneys and can be detected in urine. Since they are easy to collect, urine samples are often used in clinical, toxicological or doping control tests. These samples can be subject to accidental mixing or intentional manipulations. It is therefore of prime interest to provide a way to ascertain identity. In this study, we used a controlled storage experiment in order to evaluate the potential of identifying urine samples from European Caucasian volunteers using nDNA and mtDNA markers.

Several factors may influence the genetic identification process. A major one is the amount of DNA that can be recovered from samples. This amount mainly depends on the nature of the sample considered, the storage conditions and the efficiency of the protocol used to extract DNA. Urine generally contains small numbers of nucleated cells and is thus difficult to genotype [e.g. 1, 2, 610]. Moreover, the 20 samples tested in this work were stored at 4°C over a 4-month storage period, which can be detrimental for the few DNA they contain. Accordingly, both DNA concentrations and numbers of loci scored for the SGM Plus profiles decreased over time for most of the samples (Figs. 1 and 2). Some of them were apparently not concerned with this trend. More loci were scored for urines U3, U17 and U20 after 120 than after 60 days (Table 1). The same phenomenon was observed when these samples were extracted with the silica-based protocol (results not shown). This peculiarity may result from the concentration step of urine that preceded the DNA extraction. It is likely that the centrifuge filters (Centricon Plus-20) used after 4 months were more efficient to concentrate the samples than the centrifugation alone used after 2 months (see Materials and methods). It is also possible that centrifuge filters removed bacteria or fungi whose DNA may inhibit the amplification of human DNA [7]. Concerning the choice of the DNA extraction protocol, it is worth mentioning that about four times more nDNA was recovered from urine samples when using a P/C protocol rather than a silica-based extraction method (Fig. 1). Interestingly, the use of increased number of PCR-cycles [24] together with a multi-tube approach [13] apparently allowed to compensate for this difference because similar numbers of loci were scored for the nDNA profiles obtained with both extraction protocols (Fig. 2).

Another important factor that influences the identification process is the choice of the genetic marker. In this study, nDNA LRs were generally high, but decreased with increasing storage period (Table 1). Conversely, mtDNA LRs were generally low, but did not change over time. In fact, sequences of the HVI and HVII regions could be obtained for all the samples tested, whatever the period considered. From a practical point of view, the establishment of one nDNA profile requires one PCR reaction with a commercial kit containing multiplexed loci. In contrast, at least two amplifications and two sequencing reactions (i.e. a minimum of four PCR reactions) are needed to obtain HVI and HVII sequences for a single individual. Note that some laboratories may use up to eight or more PCRs to amplify and sequence the entire control region [e.g. 25]. The number of PCRs needed to obtain a single nDNA or mtDNA profile has to be further increased because genetic profiles should be at least duplicated [13, 26, 27] and the analyses monitored with several negative and positive controls [e.g. 27]. More efforts are also needed to read and edit several hundreds of nucleotides from the mtDNA hypervariable regions, compared to reading 10 to 15 loci for the nDNA markers. The actual doubts concerning the reliability of mtDNA databases reflect the complexity of analysing this haploid marker. In fact, sequencing and reporting errors, as well as artificial recombination between several regions from different samples, were documented in several data sets [e.g. 28, 29]. When dealing with samples from doping control cases or crime scenes, the occurrence of “artificial sequences” is likely to result in false exclusions. In case of inclusions, the maximal value of mtDNA LRs would be overestimated because the occurrence of artificial sequences would increase the size of the data base. In other words, this would give more support to the hypothesis that the sample originated from the suspect rather than from someone else from the reference population. In the present study, the maximum LR was 435 for a data base of 867 European Caucasian haplotypes. It would have been 5,001 for a data base containing 10,000 sequences. According to the results presented in Table 1, nDNA LRs obtained with only three loci can easily exceed this value. The exclusion power of mtDNA markers can be greatly enhanced by using polymorphism located outside the HVI and HVII regions [e.g. 30, 31]. Nevertheless, the strength of a match between two samples (i.e. the inclusion power) would not change until this extra information is integrated in huge extended error-free databases. It thus seems more strategic to systematically analyse biological samples that are poor in nucleated cells and/or that were stored in bad conditions with nDNA markers before undertaking mtDNA analyses. Within this context, nDNA STRs with reduced-size amplicons and/or LCN approaches could help recover some loci for difficult samples [e.g. 32, 33].