Introduction

Advances in DNA extraction techniques, amplification chemistries, and capillary electrophoresis (CE) separation methods have collectively increased the sensitivity of forensic DNA analysis. As a result, a larger percentage of short tandem repeat (STR) profiles are mixtures, and their interpretation has become increasingly challenging. Manual interpretation of STR mixtures is commonly referred to as the binary method [1] and has been the standard for almost three decades in the forensic community. Binary interpretation typically reports alleles according to static thresholds, and as a result, limits the information considered within the profile. Parameters such as peak height and heterozygous balance are used to assist with determining genotype combinations in simple mixtures, which result in the probability of one or zero for inclusion of a reference source in the profile. Conversely, binary interpretation of complex mixtures breaks down quickly with an increase in the number of contributors or a reduction in profile quality associated with lower template amounts and DNA degradation. These latter circumstances necessitate an advanced interpretation approach. Probabilistic genotyping (PGing) software has provided a solution, as the algorithm driven, continuous mode of interpretation utilizes more information in the profile, resulting in significant improvements in the analysis of complex mixtures. Various PGing approaches have been assessed and validated [2,3,4,5,6,7,8], supported by dozens of studies in the literature (for example, [9,10,11,12,13,14,15,16,17,18,19,20]), including recommended validation guidelines [21].

PGing software uses profile characteristics such as allelic peak heights, stutter percentages, drop-out rates, and degradation assessments to calculate the likelihood that alleles belonging to a presumed contributor are consistent with being a component of a mixture profile. For example, peak heights are used to assess expected profile variation at a specific locus [13]. When less DNA is amplified, peak heights decrease (impacting expected heterozygous balance), leading to a lower level of confidence or certainty for an inclusion. In addition to these fundamental factors influencing profile quality, testing methods can impact variability. For example, peak heights are impacted by an increase in PCR cycle number, reduced PCR reaction volume, or when using a 3500 CE instrument compared to a 3130xl. With a software driven PGing approach, variability can be assessed as a continuum of information and levels of confidence can be translated into likelihoods of inclusion or exclusion that reflect the specific characteristics of the typing system and the resulting mixture profile.

One of the challenges when interpreting STR mixtures is the possibility that the sample being tested exhibits differentially degraded sources of DNA, including the possibility of being deposited at different times. This is especially true for samples collected from surfaces like a tabletop or door handle. While previous studies have assessed the impact of DNA degradation on PGing results [5, 22, 23], the samples evaluated were not purposefully degraded in a differential manner, or if differential degradation was evaluated, it was done using in silico mixtures. The dataset of STR profiles in the current study was generated with sample sources of DNA that had varying levels of physical degradation through ultrasonication. The recently developed and validated (M.S. Adamowicz, T.N. Rambo, J.L. Clarke, Internal validation of the MaSTR™ probabilistic genotyping software for the interpretation of 2–5 person mixed DNA profiles, manuscript submitted for publication, personal communication) PGing software package MaSTR™ (SoftGenetics, LLC) was used to develop statistical weight estimates for a variety of two-person STR mixture profiles from differentially degraded sources of DNA. Assessments included the effectiveness of MaSTR™ to assess known levels of differential degradation and the impact of differential degradation on resulting weight estimates. These studies provide additional support for the use of PGing software solutions, addressing concerns raised by the President’s Council of Advisors on Science and Technology (PCAST) [24].

Materials and methods

Biological sample collection, extraction, and quantification

Buccal swabs were collected from four (4) unrelated individuals, two female (F1 and F2) and two male (M1 and M2) donors. Samples were collected with consent according to an Institutional Biosafety Committee approved protocol #IBC-48221. Genomic DNA was isolated using a traditional organic extraction procedure; depending on the size of the swab cutting, samples were incubated for 1 h at 56 °C with 400–600 μL of stain extraction buffer [2% SDS (Amresco; Solon OH), 10 mM EDTA (Promega; Madison, WI), 100 mM NaCl (Dot Scientific; Burton, MI), 7.6 mM Tris–HCl, pH 8.0 (Quality Biological; Gaithersburg, MD)] and 10–15 μL of 20 mg/mL Proteinase K (ThermoFisher Scientific; Carlsbad, CA), followed by purification with 400–600 μL of molecular biology grade phenol–chloroform-isoamyl alcohol (ThermoFisher Scientific), precipitation with 40 μL of sodium acetate (3 M, pH 5.3) and 1.0 mL of cold 100% ethanol, a wash with 70% ethanol, and resuspension of DNA pellets in 50 μL of low TE buffer (10 mM Tris–HCl, pH 7.5; 0.1 mM EDTA; Sigma-Aldrich; St. Louis, MO). Multiple extracts from single donors were pooled and stored at 4 °C for up to 6 weeks.

Quantification of buccal DNA extracts was performed with Quantifiler™ HP (ThermoFisher Scientific) on the Applied Biosystems 7500 instrument according to the manufacturer’s recommended protocol, using the small autosomal target. A portion of the extracts was retained and used as the undegraded single-source control and for the pristine (P) component of mixed samples. After mechanical shearing of the remaining portion, the degraded DNA samples were quantified again using the Quantifiler® HP assay, using the small autosomal target for quantification and confirming degradation status using the large target. All quantification reactions were performed in duplicate, standard curve parameters were within acceptable range, and all values were reproducible and within the dynamic range of the standard curve, adhering to the MIQE (minimum information for publication of quantitative real-time PCR experiments) and FSIG (Forensic Science International: Genetics) guidelines [25, 26].

Shearing of DNA extracts

Aliquots (~ 55 uL) of buccal DNA extracts for all donors, containing at least 10 ng/uL genomic DNA, were mechanically sheared using a Covaris S220. Target base pair (bp) sizes were 150 and 250 to simulate severe and moderate levels of DNA degradation, respectively. These values align well with the size of STR loci in commercially available kits, including the Fusion 6C STR amplification kit (Promega), which range from ~ 80 to 480 bps. The following Covaris settings were used: a peak incident power (w) of 75, 10% duty factor, 200 cycles per burst, and either 510 s of treatment time for the 150-bp samples or 160 s of treatment time for the 250-bp samples. An aliquot of each of the sheared DNA samples was run on a Bioanalyzer 2100 (Agilent) with a high sensitivity chip to ensure that the correct target sizes were achieved. Examples of the Bioanalyzer results are provided in Supplemental Fig. 1. Note: while mechanical shearing may not replicate natural degradation, it allows for the controlled analysis of samples when forming differentially degraded samples.

DNA mixture preparation and experimental design

Different contributor ratios (1:1, 1:3, 1:6, and 1:10), total DNA input amounts in the PCR (0.1, 0.25, and 0.5 ngs), and degradation status (P = no degradation, 250 = physically sheared DNA to an average length of 250 bps, and 150 = physically sheared DNA to an average length of 150 bps) were used to make a total of 144 two-person mixtures from the two donor pairs (M1:F1, F2:M2) and are listed in Table 1; 4 ratios times 3 input amounts, times 6 combinations of degradation status, times 2 donor pairs = 144 amplifications. The major contributor for the M1-F1 pair was always M1 and was always F2 for the F2-M2 pair.

Table 1 Donor ratios (1:1, 1:3, 1:6, or 1:10), total DNA input (0.1, 0.25, or 0.5 ngs), and level of extract degradation (P:P, P:250, P:150, 250:250, 150:250, 150:150, where P is no degradation, 250 is physically sheared DNA to an average length of 250 bps, and 150 is physically sheared DNA to an average length of 150 bps) were used to make a total of 144 mixtures for the two-person donor pairs (M1:F1, F2:M2). LR calculations were generated by MaSTR™ for datasets associated with the different mixture ratios, varying the number of iterations performed by MaSTR™, and with or without including a conditioning profile. Total analyses performed was 864. Details can be found in Supplemental Table 1

STR amplification, CE separation, and profile analysis

Pristine, single source samples from each donor, along with the 144 mixture samples, were amplified using the PowerPlex Fusion 6C amplification kit (Promega) according to the manufacturer’s recommendations with the exception that total reaction volume was reduced to 12.5 μL by reducing kit reagents by 50%. Thermal cycling conditions were as follows: 96 °C for 1 min, 29 cycles of 96 °C for 5 s, and 60 °C for 1 min, with a final extension at 60 °C for 10 min. Amplification products (1 μL) were separated on a 3130xl Genetic Analyzer (ThermoFisher Scientific) in 10 μL of loading solution (9.5 μL Hi-DI formamide and 0.5 μL WEN ILS 500 size standard), with injection for 10 s at 3 kV.

STR profiles were interpreted manually using the GeneMarker® HID software (v2.9.0, GM HID) [27], using an analytical threshold (AT) of 60 RFU. Peaks determined to be the result of pull-up, minus A addition, and other artifacts were not designated as true alleles and not included in the MaSTR™ analysis. A Genotype File, containing all of the alleles and their respective peak heights, was generated for each sample profile.

MaSTR™ analysis

The Markov Chain Monte Carlo (MCMC) method is used by MaSTR™ to draw sample profiles from a distribution of variables related to STR peak characteristics; see the MaSTR™ User Guide, v1.9 or later, for details (www.softgenetics.com). The MCMC algorithm used by MaSTR™ is called Metropolis–Hastings [28] and is a starting point identified for the associated variables and for each step (iteration) in the process as the software goes through the list of variables one at a time. Variables are changed to a new value by sampling from a probability distribution, resulting in a new set of conditions; this process is called Gibbs sampling. From the probability model, the likelihood of this new set of conditions is calculated and compared to the previous value. The ratio of these values is assessed and the software either moves to the new set of conditions (acceptance) or stays at the old set (rejection). MCMC sampling is, therefore, designed so that the fraction of iterations from a set of conditions is similar to the posterior probability of those conditions.

MaSTR™ establishes a model that assesses the distribution of STR profile peak heights given the characteristics of the mixture, for example, the amount of template DNA amplified per contributor, the number of contributors, the contributor ratios, and the possible genotype combinations of the contributors. The model allows the software to assess the variables associated with the actual mixture profile, providing a relative likelihood of the evidence given a proposed, modeled profile. A model was established for this study by running Fusion 6C control data at varying template amounts. Three hundred and four (304) data files of varying quality were converted into the two types of text files: a Genotype File containing all of the alleles and their respective peak heights, and a Signal File containing the alleles, their respective peak heights, and n-1, n-2 and n + 1 stutter alleles and peak heights; n-0.5 and n + 0.5 stutter (i.e., half-stutter) was not considered. Stutter filters were disabled in GM HID to enable the software to include labels on the stutter peaks, with n-2 stutter primarily observed in the blue and green dye channels of Fusion 6C. The Genotype and Signal Files were generated for each of the 304 samples, with stutter filters disabled in GM HID to enable the labeling of all stutter peaks above AT.

For each of the analysis variables, there are corresponding output plots provided by MaSTR™. The first is the trace plot, which illustrates for each iteration the value of the variable used by the MCMC. Iterations typically form the x-axis and values associated with the variable are populated on the y-axis. If a variable has multiple components, then each is plotted on one common graph. If multiple chains are used in the MCMC simulation, then the iteration axis represents the iterations of the first chain followed by the iterations of the second chain, and so forth. The second plot is a histogram for each variable reflecting the number of times that the MCMC used the value in a small interval range. The x-axis represents possible values of the variable, with individual bins representing a range of possible values. The heights of each bar represent the relative number of times that a value was used in a specific range, so larger heights represent a region of values being chosen more times than a smaller height. When a variable has multiple components, all chains are combined to make the histogram; although effects are masked by a lack of the ordering of iterations, unlike what is seen in the trace plots.

The goal of MaSTR™ and the MCMC process is to assess how likely it is to see the evidence profile if the person of interest (POI) is or is not a contributor to a mixed STR profile. The outcome is a likelihood ratio (LR) reflecting the relative strength of the evidence. MaSTR™ analysis was performed on the 144 two-person mixture profiles under varying conditions, resulting in 864 analyses; Supplemental Table 1 lists the 864 analyses performed. The major and minor contributors were fixed, with the number of contributors (NOC) set at two; the major contributor for all M1:F1 mixtures was M1 and for F2:M2 mixtures was F2, with the minor associated with the other individual in each pair, F1 and M2, respectively. As the ratio of contributors varied, the LR under the proposition that either M or F was the POI, with the other individual designated as the conditioning profile, was tested. Following a burn-in of 8000 iterations, analyses were performed with eight chains of either 10,000 or 40,000 iterations to determine if a significant difference existed between the two approaches. The population database of Hill et al. [29] was used for generating LR calculations, a co-ancestry value of 0.01 was used, and the kinship option was not selected. In addition, five replicate analyses were performed on 25 outliers to assess variance in LRs and whether run-to-run variability contributes to the differences. Finally, a subset of analyses (288) were run at 10,000 iterations without a conditioning profile to assess the impact of less information on LR outcomes.

Statistical analysis

Restricted, modified (mod) random match probabilities (RMPs) are calculated by assessing the possible profile combinations at each locus given the assumed number of contributors and observed peak heights and balance. The modRMPs were calculated manually for representative two-person mixtures and compared to LRs produced by MaSTR™, with the assumption that LR values would be significantly higher. This provided a baseline for expected LRs and highlighted any situation when the LR dropped below a modRMP value. In no case did the LR drop below the modRMP. In fact, the LR was generally many orders of magnitude higher than the modRMP, with the greatest difference (> 10 orders of magnitude) occurring for 1:1 mixtures when 0.5 ngs of pristine template was amplified.

When assessing the LR outcomes of the 864 analyses, preliminary testing on the sample distribution using the Shapiro–Wilk’s normality test [30] indicated the dataset is significantly different from a normal distribution (p = 2.2 × 10−16). Based on the Shapiro–Wilk’s test, statistical significance was evaluated by applying the non-parametric Kruskal–Wallis test [31], with post hoc testing using Dunn’s test (FSA package 0.8.30) with Holm correction for multiple testing [32]. All statistical analyses were conducted using RStudio [33, v.1.3.959; R v4.0.2].

Results and discussion

Analysis of two-person mixture profiles

Likelihood ratios (LRs) were generated with MaSTR ™ for the 144 two-person mixtures, resulting in 864 total analyses (Supplemental Table 1). Examples of MaSTR™ output files are presented in Supplemental Fig. 2, using a 1:3 mixture ratio of M1:F1 as an example. The data in the figure represent ideal conditions (0.5 ngs of pristine donor templates), with the trace (Suppl Fig. 2A) and histogram (Suppl Fig. 2B) plots clearly reflecting a 1:3 ratio of the two contributors; ~ 25% of F1 to ~ 75% of M1, with M1 as the major in all M1:F1 mixtures. A total of 80,000 iterations were performed (8 chains of 10,000 iterations) as illustrated in the ratio assessments performed for each iteration across the trace plot. The histogram view converts trace information into a distribution of the data. Given the pristine nature of the DNA templates, the degradation was predictably low (Suppl Fig. 2C), with mean values for the two sources between 0.001 and 0.002; values produced by the software to indicate low-level degradation. When degradation is high, the value moves towards 0.01. As an example, the histogram plot in Supplemental Fig. 2D depicts a 1:3 mixture of F2:M2 with 0.5 ngs of pristine and highly degraded (150 bp) sources of DNA. As expected, the range of values for the degradation spans 0.001–0.01. Assessment of the plots for the 864 MaSTR™ analyses confirmed that the software properly characterizes data as reflected in the accompanying electropherograms.

Likelihood ratios for the 864 analyses were converted to log10 values [log(LR)] and plotted against the different mixture ratios, amounts of starting template, levels of differential degradation, and selection of the person of interest (POI). MaSTR™ analyses were performed with eight chains of either 10,000 or 40,000 iterations, and with or without a conditioning profile. Therefore, three sets of 288 analyses covering the original data (10,000 iterations and with a conditioning profile), the dataset at 40,000 iterations, and the dataset without a conditioning profile were performed. Figure 1A and B are examples of data associated with 1:1 mixtures of M1:F1, and Fig. 2A and B provide data for 1:1 mixtures of F2:M2, with Supplemental Fig. 3 capturing the remaining 216 analyses of the original dataset for the 1:3, 1:6, and 1:10 mixture ratios.

Fig. 1
figure 1

Two-person mixtures at a ratio of 1:1 of the male (M1) and female (F1) contributor (M1:F1) associated with varying levels of degradation. The log of the LR values calculated by MaSTR™ are provided in the table below each figure and graphed for comparison purposes. Log values were generated for mixed samples of pristine (P) and degraded DNA (250 is sheared DNA to an average of 250 bps, and 150 is sheared DNA to an average of 150 bps; for example, P:250 is M1 pristine and F1 at 250 bps), and when considering M1 (panel A) or F1 (panel B) as the person of interest (POI)

Fig. 2
figure 2

Two-person mixtures at a ratio of 1:1 of the female (F2) and male (M2) contributor (F2:M2) associated with varying levels of degradation. The log of the LR values calculated by MaSTR™ are provided in the table below each figure and graphed for comparison purposes. Log values were generated for mixed samples of pristine and degraded DNA (see Supplemental Table 1), and when considering F2 (A) or M2 (B) as the POI

When 0.25–0.5 ngs of pristine starting template, or ~ 0.125–0.25 ngs of each contributor, was amplified for the two donor pairs, log values were consistently greater than 1030 (3.00E + 01), reflecting expected values for the identified POI. For the M1:F1 mixtures, when the POI was associated with pristine DNA (M1, Fig. 1A), these values remained above 1030 for the P:250 and P:150 mixtures. However, as degradation levels increased for the POI, the log values dropped below 1015; in one case (0.25 ngs, 150:250), the value reached a level that slightly favored the donor as a non-contributor; log(LR) =  − 0.492 or a LR of 3.22E − 1. When template amounts dropped to 0.1 ngs, log values decreased further for each mixture, impacted by reduced STR profile quality. As with manual analysis of mixtures, lower-quality STR data associated with sources of DNA at ~ 0.05 ngs for each contributor results in peak imbalance and allelic dropout which reduces the weight in support of a contributor to the mixture, exacerbated by higher levels of degradation.

When considering Fig. 2A and B, the pattern of log values was similar, recalling that the degradation levels were switched for the male and female donors, i.e., F2:M2. Values remained above 1030 for the P:250 and P:150 mixtures when the POI was associated with pristine DNA, most log values dropped as degradation levels increased for the POI, and log values decreased further when starting template dropped to 0.1 ngs. In three cases (0.1 ngs at 150:150 with F2 as the POI, 0.25 ngs at P:150 with M2 as the POI, and 0.1 ngs at P:250 with M2 as the POI), the log value favored the donor as a non-contributor; log(LR) =  − 0.754 or a LR of 1.76E − 1, log(LR) =  − 5.63 or a LR of 2.34E − 6, and log(LR) =  − 1.83 or a LR of 1.48E − 2, respectively. Most importantly, when using MaSTR™, LR values remained relatively high when the POI was associated with low amounts of starting template (0.1 ngs) and/or degraded sources of DNA (150 or 250 bps of fragmented template); ranging from 10−2 to 1025, with the majority (69%) between 105 and 1025 for M1:F1 mixtures, and from 10−6 to 1030 for F2:M2 mixtures, with the majority (69%) between 105 and 1030. The three outliers for M1:F1 (0.25 ngs at 150:250, 0.1 ngs at 150:250, and 0.1 ngs at 150:150, all with M1 as the POI) that yielded negative log values (− 0.492, − 0.514, and − 2.01, respectively) reflect a highly degraded POI with low peak heights and considerable allelic dropout. When considering the F2:M2 mixtures, the three outliers (0.1 ngs at 150:150 with F2 as the POI, and 0.1 ngs at P:250 and 0.25 ngs at P:150 with M2 as the POI) that yielded negative log values (− 0.754, − 1.83, and − 5.63, respectively) again reflect a highly degraded POI with low peak heights and considerable allelic dropout. Overall, the collective findings clearly illustrate the benefits of using a PGing approach when encountering low quality STR profiles. Manual interpretation erodes quickly when encountering challenging profiles, often resulting in inconclusive findings, whereas PGing results in a greater percentage of positive outcomes that help the trier of fact in an investigation [5].

Fusion 6C reference profiles for each of the four donor samples are provided in Supplemental Table 2, including the number of alleles per autosomal locus, per pair of donors; on average, 3.09 alleles per locus for M1:F1 and 3.22 per locus for F2:M2 mixtures. When comparing the electropherograms (Supplemental Fig. 4) associated with log data for the 1:1 ratio F2:M2 mixtures in Fig. 2A and B, values were clearly impacted by a reduction in profile quality with decreasing amounts of starting template (Suppl Fig. 4A), when the POI was associated with high amounts of template and increasing levels of degradation (Suppl Fig. 4B), and when the POI was associated with low amounts of template and increasing levels of degradation (Suppl Fig. 4C); electropherograms for the remaining four dye channels for the examples in Supplemental Fig. 4 are provided in Supplemental Fig. 5. As expected, and as illustrated in Supplemental Fig. 6, the effects were compounded as the ratio of the minor contributor decreased when considering moderate amounts of template (0.25 ngs) and the highest degradation discrepancy (P:150); the electropherograms for the remaining dye channels can be found in Supplemental Fig. 7.

The overall impact of limited amounts of degraded template on profile quality, and subsequent MaSTR™ output, is clearly illustrated in Fig. 3, including as the ratio of the minor contributor decreases. The 288 log values are coded, with the key as follows: individual values (link to each “sample”) are associated with contributor A (M1 or F2 as the POI and the major profile) or B (M2 or F1 as the POI and the minor profile), with an amount (“amt”) of total input template of X (0.5 ngs, square data points), Y (0.25 ngs, triangle data points), or Z (0.1 ngs, circle data points), and in all cases, with a ratio of contributors of 1:1, 1:3, 1:6, or 1:10, and level of degradation of P:P, P:250, P:150, 250:250, 150:250, or 150:150 (the latter presented as six panels in the figure). When the contributor was less degraded and associated with the major profile (contributor A), the log values were generally higher, as expected. As the minor contributor became more degraded (for example, the right halves of the P:250 and P:150 panels), the log values dropped significantly. This effect worsened as the amount of input template decreased (BZ1:10, dark red colored circles in each of the six panels), especially as the degradation level of each contributor also decreased (250:250, 150:250, and 150:150), illustrated through the electropherograms presented in Supplemental Fig. 8 and Supplemental Fig. 9 for 1:1 mixtures of F2:M2 with elevated levels of degradation. Most importantly, MaSTR™ was able to generate meaningful log values that supported true contributors for the vast majority (68–75%) of the two-person mixtures, determined as a log value greater than five for the original dataset when performing 10,000 iterations without and with a conditioning profile, respectively. This was consistent with expectations, and thus, supports MaSTR™ as a reliable PGing software package for the forensic community when analyzing simple and challenging two-person mixtures, including those mixtures with differentially degraded sources of DNA.

Fig. 3
figure 3

Plot of log(LR) values for individual samples, reflecting the 288 data points associated with MaSTR™ analysis of two-person mixtures at 10,000 iterations and with inclusion of a conditioning profile as the second contributor. The code associated with the key is as follows: “sample” associated with A (when M1 or F2 are the POI and the major profile) or B (when M2 or F1 are the POI and the minor profile), “amt” of input template associated with X (0.5 ngs, square data points), Y (0.25 ngs, triangle data points), or Z (0.1 ngs, circle data points), and in all cases, with the ratio of contributors as 1:1, 1:3, 1:6 or 1:10, and level of degradation associated with P:P, P:250, P:150, 250:250, 150:250, or 150:150. Each of the 24 categories (for example, AX1:1) is color coded in the figure key

Analysis of two-person mixture profiles at 40,000 iterations

A comparison of MaSTR™ log(LR) values, when performing 10,000 versus 40,000 iterations per chain, resulted in several interesting findings. Figure 4 is a series of violin plots reflecting the difference between the log values for the 10,000 and 40,000 iteration datasets; log(LR) values for the 40,000 iteration data subtracted from the values for the 10,000 iteration data. The impact of POI, template amount, mixture ratio, and degradation level were assessed independently. Figure 4A1 represents a violin plot for the POI assessment without data points, while 4A2 provides both the data points and the mean (solid line through the plot). The remaining plots Fig. 4B–D include both the data points and the mean. The overall mean was ~ 0.66 log units (a difference of ~ 4.5 in the LR), slightly favoring the 10,000 iteration approach when calculating LRs.

Fig. 4
figure 4

Violin plots comparing the 10,000 and 40,000 iteration datasets; a conditioning profile was included in each MaSTR™ analysis. Panel (A1) provides the plot without data points or the mean, while (A2) (and all other plots) provides data points and the mean for the respective datasets. In general, means were similar when considering the POI ((A2) with A as M1 and F2 and B as F1 and M2), the amount (amt) of template added to the PCR ((B) at 0.1, 0.25, or 0.5 ngs), the ratio of contributors ((C) at 1:1, 1:3, 1:6, or 1:10), and the group associated with level of degradation ((D) at P:P, P:250, P:150, 250:250, 150:250, or 150:150). The y-axis in each case reflects the difference (number of log units) between each sample in the two dataset, calculated as log(LR) values for the 40,000 iteration MaSTR™ analysis subtracted from the log(LR) values for the analysis performed at 10,000 iterations

The difference in log values was similar despite the source of the POI (Fig. 4A2, ~ 0.52 log units when the POI is A and ~ 0.80 log units when the POI is B). A normality test (see the “Materials and methods” section) clearly indicated that the data lack a normal distribution and variance. Therefore, a non-parametric test (Kruskal–Wallis) was performed which assesses the medians based on their rank to determine whether datasets are significantly different. The median for the dataset associated with POI A was 0.22 and was 0.05 for POI B, with a p-value of 0.7438 when comparing the two datasets. Most of the comparisons confirmed that using either 10,000 or 40,000 iterations was acceptable; ~ 91.3% of the log values were within ± 2.5 log units (LR difference of ~ 316), and ~ 70.8% of the values within ± 1.0 log unit (LR difference of 10). However, the distribution of the data exhibited some dissimilarities. Overall, most of the variance in log values (~ 70.1%) favored the 10,000 iteration approach, including a greater percentage of data points favoring B as the POI (~ 74.3%) rather than A as the POI (~ 66.0%), consistent with the slightly higher mean for B. Of the 25 outliers (difference of > 2.5 log units), most (72%) were associated with B, the more degraded and minor POI (F1 & M2), and all, but two, favored the 10,000 iteration approach. The two outliers that exhibited lower log values for the 10,000 iteration analysis were AX1:6 P250 and AX1:10 P:P, which is confounding, as these analyses should have been robust since they represent 0.5 ngs of amplified template and when the POI was pristine and the major contributor. Replicate analysis (five replicates via MaSTR™) on the 25 outliers confirmed that variance in LRs was relatively low and not a driving factor in the differences. Instead, the data reflect higher LRs for the 10,000 interation data (i.e., favoring a contributor) than the 40,000 iteration values (Supplemental Table 3); 88% of the replicate averages for the 10,000 iteration data were higher. To expand on this assessment, violin plots associated with template amount, mixture ratio, and degradation level were analyzed.

When evaluating the findings from Fig. 4B, the means for 0.1, 0.25, and 0.5 ngs of template were ~ 0.36, ~ 0.98, and ~ 0.65 log units, respectively; medians of 0.02, 0.44, and 0.28, respectively. A comparison of the 0.1 to 0.25 ng datasets indicated a significant difference (p-value of 0.0156), while the 0.25 to 0.5 ng comparison was marginally different (p-value of 0.0687) and the 0.1 to 0.5 ng comparison was not significant (p-value of 0.4974). Interestingly, most comparisons favoring the 40,000 iteration approach were associated with the 0.25- and 0.5-ngs profiles (~ 68.6%), with increasing scatter when moving from the 0.1- to 0.5-ng data; 66/96, 38/96, and 30/96 data points within ± 0.15 log units, respectively. The lowest difference favoring 40,000 iterations in the 0.1-ng data was approximately − 0.16 log units, with 27/96 data points favoring 40,000 iterations, while the average for the 0.25- and 0.5-ngs data was approximately − 0.34, with 59/192 data points favoring 40,000 iterations.

For Fig. 4C, the means for mixture ratios of 1:1, 1:3, 1:6, and 1:10 were ~ 0.82, ~ 0.79, ~ 0.66, and ~ 0.38 log units, respectively; medians of 0.30, 0.12, 0.06, and 0.02, respectively. The 1:1 to 1:10 comparison was significant (p-value of 0.0495), while the 1:3 to 1:10 comparison was marginally different (p-value of 0.0879) and the remaining comparisons were not significant (p-values above 0.4676). Careful evaluation of the data sheds light on the fact that the vast majority of the outliers favoring the 10,000 iteration approach (22/23) were associated with log values when the POI was degraded, and when the 1:3, 1:6, and 1:10 mixtures were linked to the minor contributor; 5 of the 8 outliers associated with the 1:1 mixture were linked to the major contributor. Overall, the plot in Fig. 4C highlights the fact that the 10,000 iteration approach is favored when considering a mixture ratio of 1:1, while the 40,000 iteration approach is favored as the mixture ratio increases.

The greatest impact on the differences between log units when comparing the 10,000 and 40,000 iteration approaches was clearly the level of template degradation (Fig. 4D). Means increased significantly from the P:P to 150:150 datasets; 0.09, 0.36, 0.70, 0.76, 0.92, and 1.15, respectively; medians of 1.2E − 4, 5.1E − 4, − 3.0E − 3, 0.53, 0.67, and 0.91, respectively. Comparisons between the P:P dataset and the three datasets associated with degraded mixtures were significant (p-values ranging from 6.8E − 5 to 2.7E − 7), comparisons between the P:250 or P:150 datasets and the three datasets associated with degraded mixtures were also significant (p-values ranging from 2.3E − 3 to 8.1E − 6), and the comparisons between P:P and P:250 or P:150, P:250 and P:150, and between the three datasets associated with degraded mixtures were not significant (p-values ranging from 0.78 to 1.0). The vast majority of data points (~ 91.7%) were within ± 0.25 log units (LR difference of ~ 1.78) for the P:P mixtures. While two of the 25 outliers (AX1:10 and BY1:6) were associated with the pristine mixtures, most were associated with degraded sources of template DNA. Therefore, it appears that the 10,000 iteration approach is favored when dealing with degraded sources of DNA and when the POI is associated with the minor contributor as the mixture ratio increases. Overall, these findings suggest that running eight chains of 10,000 and/or 40,000 iterations, when performing MaSTR™ analysis on poor-quality, two-person mixtures with a minor contributor as the POI, is a reliable approach.

Analysis of two-person mixtures with or without a conditioning profile

A comparison of MaSTR™ log(LR) values, with or without a conditioning profile, was generally consistent with expectations. As an example, Fig. 5 illustrates the data for 1:1 (panel A) and 1:10 mixtures (panel B). Without a conditioning profile, the average difference in log values was 5.49 and 1.44 for the 1:1 and 1:10 mixtures, respectively, for each dataset. This is expected since it is more challenging to deconvolute 1:1 mixtures without the aid of a conditioning profile, while less demanding when interpreting 1:10 mixtures, especially when the POI is associated with the major profile. When assessing the data in Fig. 5A, the greatest difference in log values is for the P:P mixtures, as both sources of DNA provide robust profile information. As the DNA associated with the second contributor becomes more degraded (B as the POI) and in limited quantities (Y or Z, 0.25 or 0.1 ngs), the difference between the log value decreased, but is still relatively high (> 5 log units). When comparing the electropherograms associated with the M1:F1 and F2:M2 mixtures, the 1:1 mixtures were closer to the predicted value for the F2:M2 mixture (Fig. 5), while the M1:F1 mixtures were between 1:1 and 1:2. The data in Fig. 5A reflect this difference, as the data points associated with the F2:M2 mixture gave the greatest difference in log values. Therefore, as more information becomes available, the MaSTR™ software is less impacted when a conditioning profile is excluded from the analysis.

Fig. 5
figure 5

Comparison of Log(LR) values for MaSTR™ analysis with and without a conditioning profile in the mixture. The code associated with the key is as follows: A designates when M1 or F2 is the POI and the major profile, B is when M2 or F1 is the POI and the minor profile, X is 0.5 ngs of template, Y is 0.25 ngs of template, Z is 0.1 ngs of template, 1 is M1:F1 mixtures, and 2 is F2:M2 mixtures. The ratio of contributors is either 1:1 (A) or 1:10 (B). The x-axis is level of degradation (P:P, P:250, P:150, 250:250, 150:250, or 150:150). Each of the 12 categories (for example, AX1) is color coded in the figure key, with the same color for the 1 and 2 mixtures (for example, AX1 and AX2) differentiated with a line around the circle and different shading. The y-axis in each case reflects the difference (number of log units) between each sample in the two datasets, calculated as log(LR) values for MaSTR™ analysis performed without a conditioning profile subtracted from the log(LR) values for the analysis with a conditioning profile

Conclusions

Analysis of STR mixture profiles, especially complex mixtures, continues to improve as new methods of PGing are developed and applied. The MaSTR™ software is a reliable tool for generating LRs for 2-person mixtures, including those that have differentially degraded sources of DNA; reliability when assessing non-contributors and varying number of contributors has been illustrated in a separate study (M.S. Adamowicz, T.N. Rambo, J.L. Clarke, Internal validation of the MaSTR™ probabilistic genotyping software for the interpretation of 2–5 person mixed DNA profiles, manuscript submitted for publication, personal communication). A total of 144 two-person mixtures, resulting in 864 total analyses, were assessed for the current study. Mixture ratios ranged from 1:1 to 1:10, including pristine sources of DNA and various combinations of artificially degraded sources of DNA (average size fragments of 150 or 250 bps). Quantities of DNA template were varied (0.1 to 0.5 ngs of total input) and PGing analysis was performed at 10,000 or 40,000 iterations. Overall, the MaSTR™ software performed as expected. The resulting log(LR) values for pristine mixtures were typically greater than 1030. As with binary analysis, lower-quality mixture data associated with sources of DNA at ~ 0.05 ngs for each contributor resulted in peak imbalance and allelic dropout which reduced the weight in support of a contributor to the mixture. This was exacerbated by higher levels of degradation. In some instances, this resulted in log(LR) values dropping to a level that provided weak evidence for exclusion (Figs. 1 and 2).

With all forms of PGing, users must determine the ideal number of MCMC iterations applied to perform the analysis. The total iterations are typically replicated through a series of chains; in this study, eight chains were applied for a total of 80,000 or 320,000 iterations. Too few iterations may result in inconsistent reproduction of LR values, and too many will unnecessarily consume analysis time. Interestingly, the overall data in this study slightly favored the use of 10,000 iterations per chain over 40,000 iterations. The 40,000 iteration approach was favored as the ratio of the mixture components increased (i.e., from 1:1 to 1:10), while the 10,000 iteration approach was favored when dealing with degraded sources of DNA and when the POI is associated with the minor contributor as the mixture ratio increased. Overall, these findings suggest that running eight chains of 10,000 and/or 40,000 iterations, when performing MaSTR™ analysis on poor-quality, two-person mixtures with a minor contributor as the POI, is a reliable approach and consistent with findings from other studies [34]. Nonetheless, laboratories may want to consider establishing different iteration levels depending on the nature of the mixture. Studies assessing three and four-person mixtures with differentially degraded sources of DNA have been successfully conducted, including an assessment of optimal iteration levels, and are the subject of a manuscript in preparation.