Abstract
The study aimed at evaluating whether the adoption of enlarged batteries of STR markers in kinship analysis may provide LR values suitable for discrimination of relatives from non-relatives, in comparison to conventionally used STR panels. The presence of LD among some loci and its effects on LR values were also assessed. Three hundred pairs of related and unrelated individuals, each separated from 1–3 generations and residing in North Italy were genotyped with the Investigator HDplex STR kit (Qiagen), AmpFlSTR Identifiler (Applied Biosystems), and PowerPlex Fusion System (Promega). Loci and alleles shared between each pair and within groups of relatives were compared. Also, combined LR values with and without loci in LD, sensitivity and specificity were calculated for each commercial kit and their combinations. Full siblings displayed the largest number of shared loci and alleles, with a proportion of LR ≥ 10 results significantly higher than other degrees of relatedness and, consequently, with the lowest percentage of inconclusive and false negative results. Only minor differences were detected in the combined LR distributions, after including or omitting loci in LD. However, these became only appreciable when analyzing more distant relative pairs.The implementation of additional STRs into the LR calculation allowed a complete and robust discrimination between relatives and non-relatives only for full siblings, by removing the typical uncertainty of the “grey zone”, while this was not achieved among other degrees of relatedness. Furthermore, the presence of loci in LD seems to not significantly affect LR distributions within each generation.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Personal identification in mass disasters or complex kinship analyses represents the most challenging tasks for forensic investigators. In these instances, the analysis may benefit from DNA testing one or more close living relatives of the deceased. Obtaining reliable and statistically significant likelihood ratio (LR) values would, however, require the use of a highly informative pool of autosomal markers, allowing unbiased detection of identical-by-descent variants along different degree of relatedness, such as full-siblings, half-siblings, grandparents-grandchildren, uncles-nephews, and first-degree cousins. In order to improve the power of discrimination and deliver a valuable contribution to overall LR values in complex relationships, increasing consideration is being given to expanding routinely used autosomal short tandem repeat (STR) sets of 15–22 markers with additional informative loci [1–6].
Recently, alongside with the development of new multiplexes bearing enlarged panels of 21 (GlobalFiler Kit, Life Technologies; Investigator 24plex QS Kit, Qiagen) and 22 (PowerPlex Fusion System, Promega) autosomal markers, new supplementary STR sets have been produced [7, 8]. These could be used to complement standard commercial kits and, at the same time, supply an efficient typing control on amplified products through shared markers. Up to now, seven (PowerPlex CS7 System, Promega) and 12 supplementary STRs (Investigator HDplex Kit, Qiagen) have been validated.
Previous studies assessing the potential of these expanded and supplementary multiplex kits in discerning familial relationships were published, mainly through simulated studies [7–10].
In this work, a large sample of relative pairs originating from Northeast Italy and spanning three generations, were assessed by means of two commercial kits, comprising the first widely employed kit (AmpFlSTR Identifiler, Identifiler, Applied Biosystems) and the most enlarged panel currently available (PowerPlex Fusion System, Fusion, Promega). In addition, all subjects were typed with the Investigator HDplex kit (HDplex, Qiagen), that include the highest number of supplementary markers currently available. To evaluate the additional STRs of HDplex, its use as a stand-alone panel for discerning parental relationship was also assessed. Overall essay performances and LR distributions provided by each multiplex kit and/or in different combinations, were obtained.
Furthermore, when considering the two extended sets of 26 (Identifiler + HDplex) and 32 markers (Fusion + HDplex), some STRs lie close to each other on the same chromosome and, therefore, some degree of linkage disequilibrium (LD) was reasonably expected [11–15]. Hence, we investigated how the presence and/or absence of loci in LD modulated the resulting overall LR distributions within each generation of relatives.
Materials and methods
Three hundred known pairs of relatives, divided into 128 full sibling pairs, 19 half-sibling pairs, 28 grandparent-grandchild pairs, 79 uncle-nephew pairs, and 46 cousin pairs were analyzed. From these, an equal number of unrelated pairs was generated by random selection. All individuals tested gave written informed consent and the study was approved by the Institutional Review Board of the Department of Public Health and Community Medicine, University of Verona.
Familial relationships declared by study participants were verified by collecting the largest possible cohort of available relatives within each pedigree and by using gonosomal markers (Investigator Argus X-12, Qiagen; PowerPlex Y23 System, Promega) to corroborate the statements.
Genomic DNA was extracted from buccal swabs with the QIAamp DNA Mini kit (Qiagen) according to the manufacturer’s instructions. The quantity of recovered DNA was determined using Quantifiler Duo DNA Quantification Kit (Applied Biosystems) by Real-Time PCR performed on the 7500 Real-Time PCR System with HID Analysis software v1.2. PCR amplification was carried out using the Identifiler (Applied Biosystems), Fusion (Promega) and HDplex kit (Qiagen) in a GeneAmp 9700 PCR System (Applied Biosystems), following the manufacturer’s protocols of each kit. Amplified products were detected by capillary electrophoresis in ABI Prism 3130 Genetic Analyzer (Applied Biosystems) using size standards and reference allelic ladders provided with each kit. Data analysis and genotyping were automatically assigned by GeneMapper ID-X Software v.1.2 (Applied Biosystems).
Statistics
LR values were calculated for each pair by comparing the “related” versus “unrelated” hypotheses [16] using allele frequencies of the North Italian population [17–19]. Subpopulation and mutational events were not considered into the calculation. All values were computed on Microsoft Excel.
The number of loci and alleles shared by each pair were determined by counting. A score of 2, 1, or zero was given when 2, 1, or zero alleles were shared at each locus, respectively.
Designations of “False negatives” and “Inconclusive” results were given when true relative pairs held values of LR ≤ 1 and LR ≤ 10, respectively, while that of “False positives” was assigned to real unrelated pairs who, nevertheless, provided a value of LR ≥ 1.
The sensitivity, i.e. the probability of assigning two related subjects correctly as relatives, and the specificity, i.e. the probability of assigning two unrelated persons correctly as unrelated, were determined for LR cutoff values of 1.
The term “grey zone” was referred to the range of LR values where uncertainty exist due to the overlapping between the LR distribution curves of true relatives and true unrelated.
Allele frequencies and other relevant biostatistical forensic parameters from a Northern Italy population were previously obtained using the Identifiler (Applied Biosystems) (D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, and FGA), Fusion (Promega) (D3S1358, D1S1656, D2S441, D10S1248, D13S317, Penta E, D16S539, D18S51, D2S1338, CSF1PO, Penta D, TH01, vWA, D21S11, D7S820, D5S818, TPOX, DYS391, D8S1179, D12S391, D19S433, FGA, and D22S1045) and HDplex (Qiagen) kits (SE33, D2S1360, D3S1744, D4S2366, D5S2500, D6S474, D7S1517, D8S1132, D10S2325, D21S2055) [17–19].
Results
In order to evaluate the impact of the additional loci on the resulting LR values, a comparison of LR distributions between 15 (Identifiler, Applied Biosystems) and 22 STRs (Fusion, Promega), and in conjunction with HDplex (Qiagen) supplementary loci, was performed. For this last panel, its use as a stand-alone set for discerning parental relationship was also assessed.
Number of loci and alleles shared
With regard to the number of loci shared, among the three generations assessed full-siblings retained, on average, the highest number of loci and alleles in comparison to other degrees of relatedness. This was reasonably expected, since they may share in IBD at least one fourth of their parental genome.
The average number of alleles shared by first degree relatives, increased progressively from 14 to 19, 27, 32, and 40 when HDplex, Identifiler, Fusion, Identifiler + HDplex, and Fusion + HDplex kits combinations were used, respectively. Similarly, the average number of loci shared, incremented from 10 to13, 19, 22 and 28 according to the same order. However, when normalizing these data to the number of loci included per kit, values appeared to not differ significantly (t test, p < 0.01). Therefore, the number of STR used into the calculation appears to not affect overall LR values.
Nevertheless, a tenfold higher difference in the average number of shared loci and alleles between full-siblings and first-degree cousins was observed when switching from Identifiler + HDplex to Fusion + HDplex (Table S1A-S1B).
LR value
Focusing on posterior probability values, the LR lower cut-off of 200 (W = 99.5 %) assuming 50 % probability a priori, adopted in classical paternity testing involving a mother-father-child trio [20], was not determined for kinship analyses, to discriminate between related and unrelated subjects.
In this study, a LR threshold of 10, equal to a W = 90.9 % was adopted. All values falling below were defined as “inconclusive (1 < LR ≤ 10)/negative (0 < LR ≤ 1): no support for relationship” while those above, as “positive: very strong evidence”. These last were further divided, according to LR values, in four categories, specifically, moderate (10 < LR ≤ 100), moderately strong (100 < LR ≤ 1000), strong (1000 < LR ≤ 10000), and very strong (LR > 10000) [21, 22].
When assessing STR panels singly, the Fusion kit provided the highest proportion of pairs bearing LR ≥ 10 among the three generations considered, with a decreasing trend from full-siblings to first degree cousins. Only within this last group, HDplex kit performed slightly better than Fusion (24 % of HDplex vs. 20 % of Fusion, Table S1A-S1B).
Of two extended STR combinations, specifically Identifiler + HDplex (n = 26 STRs) and Fusion + HDplex (n = 32 STRs), this last was found to be the optimal set of markers, owing to the largest number of STRs included among all sets.
Within full-siblings, the number of pairs with LR ≥ 10 raised from 97 % with Fusion to 100 % by means of Fusion + HDplex, while a 20 % increase was observed for each of the last two degrees of relatedness.
LR values between 0–1 and 1–10 were defined as “negative” and “inconclusive”, respectively, and true relative pairs wrongly included in these intervals may generate false negative results.
The percentages of negative/inconclusive results per generation were found to be the highest for the Identifiler kit, and regardless of which STR set is considered, these percentages raised substantially within half-siblings and reached its maximum among first-degree cousins (50 %).
Analogous considerations could be made for Identifiler + HDplex and Fusion + HDplex associations, and notably, the second set did not give any negative/inconclusive results within the first generation of relatives, thus avoiding interpretation biases (Table 1 and Fig. S1).
With regard to the three generations of unrelated pairs, the percentages of results falling in the inconclusive zone (1 ≤ LR < 10) markedly increased from the first to the third generation when using Identifiler, HDplex and Fusion sets. False positive results (LR ≥ 10), by means of the Identifiler kit, were revealed in unrelated brothers, half-brothers, and uncles-nephews groups, and were limited to unrelated uncles-nephews and first cousins when using the Fusion kit (Table S1A-S1B).
When considering extended STR combinations, the percentages of unrelated pairs falling in the inconclusive area decreased in comparison to single kits, while no false positives were detected.
Exact and rough LR value
With the use of enlarged batteries of STR markers, the odds of finding loci residing closely spaced on the same chromosome may become greater, and that may therefore complicate the application of the product rule for LR calculation in kinship analyses [9, 14]. As previously reported, the presence of linked loci may require either to exclude the less informative locus [23] or, alternatively, adopt a more conservative approach including the recombination rate in the LR computation [14]. Based on former studies, loci closer than 50 centimorgans (cM) were identified [11–15], and the less informative among those were removed from the calculations, according to the statistical parameters (PIC, Het, PE and PD) determined in the North Italian population [17–19]. Specifically, these were D5S818 (Identifiler), D6S474 (HDplex), and vWA, Penta D, D5S818 (Fusion). A total of six and seven STRs were excluded from the Identifiler + HDplex (D5S818, TPOX, D6S474, D8S1179, vWA, D21S11) and Fusion + HDplex sets (same loci plus Penta D), respectively.
The number of alleles and loci shared, as well as LR distributions considering or excluding linkage (exact and rough, respectively), seemed to be very similar (Table S1A-S1B and Fig. S1) between relatives and non-relatives, with only minor discrepancies for some scenarios.
By means of 11 STRs (HDplex), the percentage of true half-siblings pairs with 1 ≤ LR < 10 raised from 26 % (exact) to 32 % (rough) while, for unrelated half-siblings, the shift was from 0 (exact) to 5 % (rough). Interestingly, after excluding 3 loci in LD from the set of 22 STRs (Fusion), the percentages of grandparent-grandchild pairs with 1 ≤ LR < 10 raised from 28 to 43 % (Table 1).
Focusing on the extended set of STRs Identifiler + HDplex (20 markers, no linkage), the most significant increase was detected for grandparent-grandchild pairs falling within the “inconclusive” zone, where the number of relatives rose from 18 to 25 % and that of non-relatives from 0 to 4 %.
In that same group of relatives, the combination Fusion + HDplex (25 STRs, no linkage) showed similar LR changes for related (from 11 to 25 %) and unrelated (from 0 to 4 %) pairs. These analogous increments were also noted for half-siblings (Table 1).
To sum up, the reduction of combined LR values, owing to the exclusion of loci in LD from the calculation, resulted in increased percentages of related and unrelated pairs falling in the area of uncertainty (inconclusive), and in a larger overlapping of the respective LR distributions, with a subsequent rise of the uncertainty.
However, for some scenarios, the removal of the less informative among linked loci, determined an unexpected decreased overlap between related and unrelated, and the reduction in the number of STRs mostly had negligible effects on the LR computation, since only poorly discriminative markers were excluded. For example, within the interval of 1 ≤ LR < 10, a decreasing in the number of half-siblings pairs was observed both with Fusion (exact 27 %, rough 16 %) and Identifiler + HDplex (exact 16 %, rough 5 %). For the latter, such a reduction affected also the unrelated (exact 5 %, rough 0 %) (Table 1).
Previous studies defined, as “grey zone”, the range of uninformative results with LR values from 0.067 to 10.3 [24–26]. Data from this study are in line with the literature, with few exceptions among non-relatives, where values of LR = 100 could be observed by means of Identifiler and Fusion kits (Table 1).
Comparison with other studies
Confining to full- and half-siblings, comparisons are done from this and other empirical/simulated studies [27, 28] using 13/15 STRs and 20/22 STRs (Table 2).
According to empirical data, a consistent concordance is shown with at least 81 % of full siblings’ pairs, evincing LR ≥ 10 (W ≥ 90.9 %) when considering 15 STRs (Identifiler/PowerPlex 16HS) and 90 % considering 20/22 STRs (PowerPlex 21/Fusion). Some discrepancies are observed with LR ≤ 1 (W ≤ 50 %), providing a percentage higher from three to seven times in von Wurmb-Schwark et al. (2015) [27] study. The same considerations are applied for half-siblings’ pairs.
In agreement with simulated data, a robust concordance among 13/15 STRs combined DNA index system (CODIS), Identifiler and Powerplex 16HS), and 20/22 STRs (CODIS + ESS, Fusion and Powerplex 21) datasets [27, 28] could be detected.
Sensitivity and specificity
Of the three commercial kits assessed, HDplex was the STR set with the highest sensitivity and specificity, even with the exclusion of the D6S474 marker. When instead considering the two STR combinations, i.e., Identifiler + HDplex and Fusion + HDplex, both performed equally in terms of specificity even if the second panel retained the greatest sensitivity (Table S2). These findings are in line with previous studies based on simulations data [6–8, 12].
Discussion
The aim of this work was to define which among three commercial multiplex kits and their combinations (from 12 and 32 STRs), more efficiently discriminated relatives from non-relatives up to the 3rd degree of relatedness, without considering parent-child.
As expected, extended STR combinations (Identifiler + HDplex and Fusion + HDplex), positively influenced the resolution of the kinship relationship, bolstered by the supplementary markers included.
This result is in accordance with literature data [10], reporting that a shift from 15 to 24 loci improved the separation between relatives and non-relatives, and narrowed the area of overlap between false negative and false positive curves, defined as “grey zone”. Within this study, this effect reached its maximum effect within full siblings (they share a higher number of IBD loci) by incrementing from 12 (HDplex) to 32 STRs (Fusion + HDplex) and became almost null among first cousins. This was expected, owing to reduced numbers of IBD alleles with the increasing of generations, and may be partially compensated by increasing the number of relative pairs considered [9]. A final consideration could be made on the statistical parameters (Het, Pic, PD, PE) characterizing the STR loci investigated. It is straightforward that a highly polymorphic and heterozygous locus, with allelic variants well sorted in the population, adds strength to the STR multiplex in which is included. Such a locus may represent a valid supplementary STR for difficult kinship analyses. Conversely, the more low discriminative loci are included, the more negatively influenced the resulting LR will be. As a matter of fact, HDplex kit, with only 12 STRs, combines loci with Het values ≥0.7 and a high PD. This, in turn, positively impacted sensitivity and specificity, with increased performance than Identifiler and Fusion, and suggested its suitability in paternity cases may not be limited to only supplement current commercial kits, rather as a stand-alone STR panel.
Without considering LD, we assessed how the exclusion of linked loci from the LR calculation impacted the differentiation between related from unrelated, by increasing the chances of observing false positives and false negatives. The calculated posterior probability was found to be lower among all relationships assessed. After removing loci in LD, no major variation in related/unrelated differentiation was revealed, even when excluding 7 loci from the Fusion + HDplex combination (32 loci). HDplex, therefore, proved to be undoubtedly the best set of autosomal loci for kinship testing currently available.
References
Wenk RE, Chiafari FA, Gorlin J et al (2003) Better tools are needed for parentage and kinship studies. Transfusion 43:979–981
Poetsch M, Lüdcke C, Repenning A et al (2006) The problem of single parent/child paternity analysis-practical results involving 336 children and 348 unrelated men. Forensic Sci Int 159:98–103
Betz T, Immel UD, Kleiber M et al (2007) "Paterniplex", a highly discriminative decaplex STR multiplex tailored for investigating special problems in paternity testing. Electrophoresis 28:3868–3874
Lee JC, Lin YY, Tsai LC et al (2012) A novel strategy for sibship determination in trio sibling model. Croat Med J 53:336–342
Carboni I, Iozzi S, Nutini AL et al (2014) Improving complex kinship analyses with additional STR loci. Electrophoresis 35:3145–3151
Phillips C, Gelabert-Besada M, Fernandez-Formoso L et al (2014) "New turns from old STaRs": enhancing the capabilities of forensic short tandem repeat analysis. Electrophoresis 35:3173–3187
Phillips C, Fernandez-Formoso L, Gelabert-Besada M et al (2014) Global population variability in Qiagen Investigator HDplex STRs. Forensic Sci Int Genet 8:36–43
Tillmar AO, Mostad P (2014) Choosing supplementary markers in forensic casework. Forensic Sci Int Genet 13:128–133
Nothnagel M, Schmidtke J, Krawczak M (2010) Potentials and limits of pairwise kinship analysis using autosomal short tandem repeat loci. Int J Legal Med 124:205–215
Tamura T, Osawa M, Ochiai E et al (2015) Evaluation of advanced multiplex short tandem repeat systems in pairwise kinship analysis. Legal Med 17:320–325
O'Connor KL, Hill CR, Vallone PM et al (2011) Linkage disequilibrium analysis of D12S391 and vWA in U.S. population and paternity samples. Forensic Sci Int Genet 5:538–540
Westen AA, Haned H, Grol LJ et al (2012) Combining results of forensic STR kits: HDplex validation including allelic association and linkage testing with NGM and Identifiler loci. Int J Legal Med 126:781–789
Phillips C, Ballard D, Gill P et al (2012) The recombination landscape around forensic STRs: accurate measurement of genetic distances between syntenic STR pairs using HapMap high density SNP data. Forensic Sci Int Genet 6:354–365
Gill P, Phillips C, McGovern C et al (2012) An evaluation of potential allelic association between the STRs vWA and D12S391: implications in criminal casework and applications to short pedigrees. Forensic Sci Int Genet 6:477–486
Wu W, Hao H, Liu Q et al (2014) Analysis of linkage and linkage disequilibrium for syntenic STRs on 12 chromosomes. Int J Legal Med 128:735–739
Wenk RE, Traver M, Chiafari FA (1996) Determination of sibship in any two persons. Transfusion 36:259–262
Presciuttini S, Cerri N, Turrina S et al (2006) Validation of a large Italian Database of 15 STR loci. Forensic Sci Int 156:266–268
Turrina S, Ferrian M, Caratti S, De Leo D (2015) Investigator HDplex markers: allele frequencies and mutational events in a North Italian population. Int J Legal Med 129:730–733
Turrina S, Ferrian M, Caratti S, De Leo D (2014) Evaluation of genetic parameters of 22 autosomal STR loci (PowerPlex® Fusion System) in a population sample from Northern Italy. Int J Legal Med 128:281–283
Gjertson DW, Brenner CH, Baur MP et al (2007) ISFG:Recommendations on biostatistics in paternity testing. Forensic Sci Int Genet 1:223–231
Evett IW, Jackson G, Lambert JA et al (2000) The impact of the principles of evidence interpretation on the structure and content of statements. Sci Justice 40:233–239
Association of Forensic Science Providers (2009) Standards for the formulation of evaluative forensic science expert opinion. Sci Justice 49:161–164
Budowle B, Ge J, Chakraborty R et al (2011) Population genetic analyses of the NGM STR loci. Int J Legal Med 125:101–109
Tzeng CH, Lyou JY, Chen YR et al (2000) Determination of sibship by PCR-amplified short tandem repeat analysis in Taiwan. Transfusion 40:840–845
Giroti RI, Verma S, Singh K et al (2007) A grey zone approach for evaluation of 15 short tandem repeat loci in sibship analysis: a pilot study in Indian subjects. J Forensic Leg Med 14:261–265
Musanovic J, Filipovska-Musanovic M, Kovacevic L et al (2012) Determination of combined sibship indices "gray zone" using 15 STR loci for central Bosnian human population. Mol Biol Rep 39:5195–5200
von Wurmb-Schwark N, Podruks E, Schwark T et al (2015) About the power of biostatistics in sibling analysis-comparison of empirical and simulated data. Int J Legal Med 129(6):1201–1209
O’Connor KL, Interpretation of DNA typing Results for kinship analysis. 2011 [cited 2011 Jan 25]. Available from: http://www.cstl.nist.gov/strbase/pub_pres/OConnor_USCIS_ interpretation%20of%20DNA.pdf
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
All individuals tested gave written informed consent, and the study was approved by the Institutional Review Board of the Department of Public Health and Community Medicine, University of Verona.
Conflict of interest
The authors declare that they have no competing interests.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Figure S1
Distribution of log10LR values calculated under different kinship scenarios either with (exact) or without (rough) loci in linkage disequilibrium. HD = Investigator® HDplex kit (Qiagen), ID = AmpFlSTR® Identifiler PCR Amplification kit (Applied Biosystems), FUS = PowerPlex® Fusion System (Promega). (7Z 537 kb)
ESM 2
(XLSX 26 kb)
Rights and permissions
About this article
Cite this article
Turrina, S., Ferrian, M., Caratti, S. et al. Kinship analysis: assessment of related vs unrelated based on defined pedigrees. Int J Legal Med 130, 113–119 (2016). https://doi.org/10.1007/s00414-015-1290-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00414-015-1290-3