Introduction

Personal identification in mass disasters or complex kinship analyses represents the most challenging tasks for forensic investigators. In these instances, the analysis may benefit from DNA testing one or more close living relatives of the deceased. Obtaining reliable and statistically significant likelihood ratio (LR) values would, however, require the use of a highly informative pool of autosomal markers, allowing unbiased detection of identical-by-descent variants along different degree of relatedness, such as full-siblings, half-siblings, grandparents-grandchildren, uncles-nephews, and first-degree cousins. In order to improve the power of discrimination and deliver a valuable contribution to overall LR values in complex relationships, increasing consideration is being given to expanding routinely used autosomal short tandem repeat (STR) sets of 15–22 markers with additional informative loci [16].

Recently, alongside with the development of new multiplexes bearing enlarged panels of 21 (GlobalFiler Kit, Life Technologies; Investigator 24plex QS Kit, Qiagen) and 22 (PowerPlex Fusion System, Promega) autosomal markers, new supplementary STR sets have been produced [7, 8]. These could be used to complement standard commercial kits and, at the same time, supply an efficient typing control on amplified products through shared markers. Up to now, seven (PowerPlex CS7 System, Promega) and 12 supplementary STRs (Investigator HDplex Kit, Qiagen) have been validated.

Previous studies assessing the potential of these expanded and supplementary multiplex kits in discerning familial relationships were published, mainly through simulated studies [710].

In this work, a large sample of relative pairs originating from Northeast Italy and spanning three generations, were assessed by means of two commercial kits, comprising the first widely employed kit (AmpFlSTR Identifiler, Identifiler, Applied Biosystems) and the most enlarged panel currently available (PowerPlex Fusion System, Fusion, Promega). In addition, all subjects were typed with the Investigator HDplex kit (HDplex, Qiagen), that include the highest number of supplementary markers currently available. To evaluate the additional STRs of HDplex, its use as a stand-alone panel for discerning parental relationship was also assessed. Overall essay performances and LR distributions provided by each multiplex kit and/or in different combinations, were obtained.

Furthermore, when considering the two extended sets of 26 (Identifiler + HDplex) and 32 markers (Fusion + HDplex), some STRs lie close to each other on the same chromosome and, therefore, some degree of linkage disequilibrium (LD) was reasonably expected [1115]. Hence, we investigated how the presence and/or absence of loci in LD modulated the resulting overall LR distributions within each generation of relatives.

Materials and methods

Three hundred known pairs of relatives, divided into 128 full sibling pairs, 19 half-sibling pairs, 28 grandparent-grandchild pairs, 79 uncle-nephew pairs, and 46 cousin pairs were analyzed. From these, an equal number of unrelated pairs was generated by random selection. All individuals tested gave written informed consent and the study was approved by the Institutional Review Board of the Department of Public Health and Community Medicine, University of Verona.

Familial relationships declared by study participants were verified by collecting the largest possible cohort of available relatives within each pedigree and by using gonosomal markers (Investigator Argus X-12, Qiagen; PowerPlex Y23 System, Promega) to corroborate the statements.

Genomic DNA was extracted from buccal swabs with the QIAamp DNA Mini kit (Qiagen) according to the manufacturer’s instructions. The quantity of recovered DNA was determined using Quantifiler Duo DNA Quantification Kit (Applied Biosystems) by Real-Time PCR performed on the 7500 Real-Time PCR System with HID Analysis software v1.2. PCR amplification was carried out using the Identifiler (Applied Biosystems), Fusion (Promega) and HDplex kit (Qiagen) in a GeneAmp 9700 PCR System (Applied Biosystems), following the manufacturer’s protocols of each kit. Amplified products were detected by capillary electrophoresis in ABI Prism 3130 Genetic Analyzer (Applied Biosystems) using size standards and reference allelic ladders provided with each kit. Data analysis and genotyping were automatically assigned by GeneMapper ID-X Software v.1.2 (Applied Biosystems).

Statistics

LR values were calculated for each pair by comparing the “related” versus “unrelated” hypotheses [16] using allele frequencies of the North Italian population [1719]. Subpopulation and mutational events were not considered into the calculation. All values were computed on Microsoft Excel.

The number of loci and alleles shared by each pair were determined by counting. A score of 2, 1, or zero was given when 2, 1, or zero alleles were shared at each locus, respectively.

Designations of “False negatives” and “Inconclusive” results were given when true relative pairs held values of LR ≤ 1 and LR ≤ 10, respectively, while that of “False positives” was assigned to real unrelated pairs who, nevertheless, provided a value of LR ≥ 1.

The sensitivity, i.e. the probability of assigning two related subjects correctly as relatives, and the specificity, i.e. the probability of assigning two unrelated persons correctly as unrelated, were determined for LR cutoff values of 1.

The term “grey zone” was referred to the range of LR values where uncertainty exist due to the overlapping between the LR distribution curves of true relatives and true unrelated.

Allele frequencies and other relevant biostatistical forensic parameters from a Northern Italy population were previously obtained using the Identifiler (Applied Biosystems) (D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, and FGA), Fusion (Promega) (D3S1358, D1S1656, D2S441, D10S1248, D13S317, Penta E, D16S539, D18S51, D2S1338, CSF1PO, Penta D, TH01, vWA, D21S11, D7S820, D5S818, TPOX, DYS391, D8S1179, D12S391, D19S433, FGA, and D22S1045) and HDplex (Qiagen) kits (SE33, D2S1360, D3S1744, D4S2366, D5S2500, D6S474, D7S1517, D8S1132, D10S2325, D21S2055) [1719].

Results

In order to evaluate the impact of the additional loci on the resulting LR values, a comparison of LR distributions between 15 (Identifiler, Applied Biosystems) and 22 STRs (Fusion, Promega), and in conjunction with HDplex (Qiagen) supplementary loci, was performed. For this last panel, its use as a stand-alone set for discerning parental relationship was also assessed.

Number of loci and alleles shared

With regard to the number of loci shared, among the three generations assessed full-siblings retained, on average, the highest number of loci and alleles in comparison to other degrees of relatedness. This was reasonably expected, since they may share in IBD at least one fourth of their parental genome.

The average number of alleles shared by first degree relatives, increased progressively from 14 to 19, 27, 32, and 40 when HDplex, Identifiler, Fusion, Identifiler + HDplex, and Fusion + HDplex kits combinations were used, respectively. Similarly, the average number of loci shared, incremented from 10 to13, 19, 22 and 28 according to the same order. However, when normalizing these data to the number of loci included per kit, values appeared to not differ significantly (t test, p < 0.01). Therefore, the number of STR used into the calculation appears to not affect overall LR values.

Nevertheless, a tenfold higher difference in the average number of shared loci and alleles between full-siblings and first-degree cousins was observed when switching from Identifiler + HDplex to Fusion + HDplex (Table S1A-S1B).

LR value

Focusing on posterior probability values, the LR lower cut-off of 200 (W = 99.5 %) assuming 50 % probability a priori, adopted in classical paternity testing involving a mother-father-child trio [20], was not determined for kinship analyses, to discriminate between related and unrelated subjects.

In this study, a LR threshold of 10, equal to a W = 90.9 % was adopted. All values falling below were defined as “inconclusive (1 < LR ≤ 10)/negative (0 < LR ≤ 1): no support for relationship” while those above, as “positive: very strong evidence”. These last were further divided, according to LR values, in four categories, specifically, moderate (10 < LR ≤ 100), moderately strong (100 < LR ≤ 1000), strong (1000 < LR ≤ 10000), and very strong (LR > 10000) [21, 22].

When assessing STR panels singly, the Fusion kit provided the highest proportion of pairs bearing LR ≥ 10 among the three generations considered, with a decreasing trend from full-siblings to first degree cousins. Only within this last group, HDplex kit performed slightly better than Fusion (24 % of HDplex vs. 20 % of Fusion, Table S1A-S1B).

Of two extended STR combinations, specifically Identifiler + HDplex (n = 26 STRs) and Fusion + HDplex (n = 32 STRs), this last was found to be the optimal set of markers, owing to the largest number of STRs included among all sets.

Within full-siblings, the number of pairs with LR ≥ 10 raised from 97 % with Fusion to 100 % by means of Fusion + HDplex, while a 20 % increase was observed for each of the last two degrees of relatedness.

LR values between 0–1 and 1–10 were defined as “negative” and “inconclusive”, respectively, and true relative pairs wrongly included in these intervals may generate false negative results.

The percentages of negative/inconclusive results per generation were found to be the highest for the Identifiler kit, and regardless of which STR set is considered, these percentages raised substantially within half-siblings and reached its maximum among first-degree cousins (50 %).

Analogous considerations could be made for Identifiler + HDplex and Fusion + HDplex associations, and notably, the second set did not give any negative/inconclusive results within the first generation of relatives, thus avoiding interpretation biases (Table 1 and Fig. S1).

Table 1 Pairwise percentages of related (Ha) and unrelated (Ho) along the different relationships obtained for various LR cutoffs, computed with (exact) and without (rough) loci in linkage

With regard to the three generations of unrelated pairs, the percentages of results falling in the inconclusive zone (1 ≤ LR < 10) markedly increased from the first to the third generation when using Identifiler, HDplex and Fusion sets. False positive results (LR ≥ 10), by means of the Identifiler kit, were revealed in unrelated brothers, half-brothers, and uncles-nephews groups, and were limited to unrelated uncles-nephews and first cousins when using the Fusion kit (Table S1A-S1B).

When considering extended STR combinations, the percentages of unrelated pairs falling in the inconclusive area decreased in comparison to single kits, while no false positives were detected.

Exact and rough LR value

With the use of enlarged batteries of STR markers, the odds of finding loci residing closely spaced on the same chromosome may become greater, and that may therefore complicate the application of the product rule for LR calculation in kinship analyses [9, 14]. As previously reported, the presence of linked loci may require either to exclude the less informative locus [23] or, alternatively, adopt a more conservative approach including the recombination rate in the LR computation [14]. Based on former studies, loci closer than 50 centimorgans (cM) were identified [1115], and the less informative among those were removed from the calculations, according to the statistical parameters (PIC, Het, PE and PD) determined in the North Italian population [1719]. Specifically, these were D5S818 (Identifiler), D6S474 (HDplex), and vWA, Penta D, D5S818 (Fusion). A total of six and seven STRs were excluded from the Identifiler + HDplex (D5S818, TPOX, D6S474, D8S1179, vWA, D21S11) and Fusion + HDplex sets (same loci plus Penta D), respectively.

The number of alleles and loci shared, as well as LR distributions considering or excluding linkage (exact and rough, respectively), seemed to be very similar (Table S1A-S1B and Fig. S1) between relatives and non-relatives, with only minor discrepancies for some scenarios.

By means of 11 STRs (HDplex), the percentage of true half-siblings pairs with 1 ≤ LR < 10 raised from 26 % (exact) to 32 % (rough) while, for unrelated half-siblings, the shift was from 0 (exact) to 5 % (rough). Interestingly, after excluding 3 loci in LD from the set of 22 STRs (Fusion), the percentages of grandparent-grandchild pairs with 1 ≤ LR < 10 raised from 28 to 43 % (Table 1).

Focusing on the extended set of STRs Identifiler + HDplex (20 markers, no linkage), the most significant increase was detected for grandparent-grandchild pairs falling within the “inconclusive” zone, where the number of relatives rose from 18 to 25 % and that of non-relatives from 0 to 4 %.

In that same group of relatives, the combination Fusion + HDplex (25 STRs, no linkage) showed similar LR changes for related (from 11 to 25 %) and unrelated (from 0 to 4 %) pairs. These analogous increments were also noted for half-siblings (Table 1).

To sum up, the reduction of combined LR values, owing to the exclusion of loci in LD from the calculation, resulted in increased percentages of related and unrelated pairs falling in the area of uncertainty (inconclusive), and in a larger overlapping of the respective LR distributions, with a subsequent rise of the uncertainty.

However, for some scenarios, the removal of the less informative among linked loci, determined an unexpected decreased overlap between related and unrelated, and the reduction in the number of STRs mostly had negligible effects on the LR computation, since only poorly discriminative markers were excluded. For example, within the interval of 1 ≤ LR < 10, a decreasing in the number of half-siblings pairs was observed both with Fusion (exact 27 %, rough 16 %) and Identifiler + HDplex (exact 16 %, rough 5 %). For the latter, such a reduction affected also the unrelated (exact 5 %, rough 0 %) (Table 1).

Previous studies defined, as “grey zone”, the range of uninformative results with LR values from 0.067 to 10.3 [2426]. Data from this study are in line with the literature, with few exceptions among non-relatives, where values of LR = 100 could be observed by means of Identifiler and Fusion kits (Table 1).

Comparison with other studies

Confining to full- and half-siblings, comparisons are done from this and other empirical/simulated studies [27, 28] using 13/15 STRs and 20/22 STRs (Table 2).

Table 2 Comparison of percentages between empirical (A) (this study and von Wurmb-Schwark et al. 2015 [27]) and simulated (B) (O’Connor 2011 [28]) data, for related and unrelated groups, within full siblings and half-siblings considering different cutoffs of the combined LR

According to empirical data, a consistent concordance is shown with at least 81 % of full siblings’ pairs, evincing LR ≥ 10 (W ≥ 90.9 %) when considering 15 STRs (Identifiler/PowerPlex 16HS) and 90 % considering 20/22 STRs (PowerPlex 21/Fusion). Some discrepancies are observed with LR ≤ 1 (W ≤ 50 %), providing a percentage higher from three to seven times in von Wurmb-Schwark et al. (2015) [27] study. The same considerations are applied for half-siblings’ pairs.

In agreement with simulated data, a robust concordance among 13/15 STRs combined DNA index system (CODIS), Identifiler and Powerplex 16HS), and 20/22 STRs (CODIS + ESS, Fusion and Powerplex 21) datasets [27, 28] could be detected.

Sensitivity and specificity

Of the three commercial kits assessed, HDplex was the STR set with the highest sensitivity and specificity, even with the exclusion of the D6S474 marker. When instead considering the two STR combinations, i.e., Identifiler + HDplex and Fusion + HDplex, both performed equally in terms of specificity even if the second panel retained the greatest sensitivity (Table S2). These findings are in line with previous studies based on simulations data [68, 12].

Discussion

The aim of this work was to define which among three commercial multiplex kits and their combinations (from 12 and 32 STRs), more efficiently discriminated relatives from non-relatives up to the 3rd degree of relatedness, without considering parent-child.

As expected, extended STR combinations (Identifiler + HDplex and Fusion + HDplex), positively influenced the resolution of the kinship relationship, bolstered by the supplementary markers included.

This result is in accordance with literature data [10], reporting that a shift from 15 to 24 loci improved the separation between relatives and non-relatives, and narrowed the area of overlap between false negative and false positive curves, defined as “grey zone”. Within this study, this effect reached its maximum effect within full siblings (they share a higher number of IBD loci) by incrementing from 12 (HDplex) to 32 STRs (Fusion + HDplex) and became almost null among first cousins. This was expected, owing to reduced numbers of IBD alleles with the increasing of generations, and may be partially compensated by increasing the number of relative pairs considered [9]. A final consideration could be made on the statistical parameters (Het, Pic, PD, PE) characterizing the STR loci investigated. It is straightforward that a highly polymorphic and heterozygous locus, with allelic variants well sorted in the population, adds strength to the STR multiplex in which is included. Such a locus may represent a valid supplementary STR for difficult kinship analyses. Conversely, the more low discriminative loci are included, the more negatively influenced the resulting LR will be. As a matter of fact, HDplex kit, with only 12 STRs, combines loci with Het values ≥0.7 and a high PD. This, in turn, positively impacted sensitivity and specificity, with increased performance than Identifiler and Fusion, and suggested its suitability in paternity cases may not be limited to only supplement current commercial kits, rather as a stand-alone STR panel.

Without considering LD, we assessed how the exclusion of linked loci from the LR calculation impacted the differentiation between related from unrelated, by increasing the chances of observing false positives and false negatives. The calculated posterior probability was found to be lower among all relationships assessed. After removing loci in LD, no major variation in related/unrelated differentiation was revealed, even when excluding 7 loci from the Fusion + HDplex combination (32 loci). HDplex, therefore, proved to be undoubtedly the best set of autosomal loci for kinship testing currently available.