Introduction

The determination of genetic relatedness is frequently adopted in several forensic applications, such as kinship confirmation after separation, inheritance disputes between illegitimate children, immigration cases, and personal identification of missing persons, unknown bodies, and disaster victims [1,2,3]. In such cases, the pedigree structure can be determined by using a likelihood ratio (LR) method based on genetic marker data for a set of persons, so that the determination of a relationship and the identification of a person of interest (POI) are achieved.

Generally, two questions need to be answered before testing: (i) How many markers are needed and (ii) how many reference relatives and who should be genotyped if there is a choice? Many studies have shown that adding genetic markers (STR and/or SNP) can improve discrimination between relatives and non-relatives [3,4,5,6,7]. However, the number of added markers depends on the detection systems available in the laboratory. In addition, further testing may be impossible for DNA samples with limited quality and quantity, such as trace DNA or degraded DNA. Therefore, this question can be converted to whether it is sufficient to perform a kinship analysis with available kits or genetic data. With respect to reference relatives, choosing the most informative references and/or typing more relatives can also improve the discrimination power of a genotyping system in kinship testing and missing person identification. Ge et al. [1] suggested that first degree relatives (parents and full siblings) were the most preferred relatives and references with less genetic dependence were superior to those with more genetic dependence. However, it is possible that the most suitable candidates, e.g., parent(s), are not available and more distant relatives need to be genotyped. One of this kind is the well-known “Missing Grandchildren of Argentina”, where the biological parents of POI were murdered and their bodies still remain missing [8]. Sometimes, there may be many relatives, say ten full siblings. Conceivably, it is not necessary to genotype all of them. Furthermore, if a reliable conclusion cannot be made after initial testing, further data must be gathered by recruiting additional family members. Prioritization problems may be encountered because the addition of one relative may provide higher discrimination power than the one of another [3, 9]. Last but not the least, different labs may have different thresholds to confirm a relationship [10, 11]. Selecting a lower threshold decreases the false negative rate (FNR) but at the cost of increasing the false positive rate (FPR). A higher threshold generally results in a higher accuracy but lower effectiveness [12]. Accordingly, the number of kits/markers and reference relatives need to be increased to reach an explicit and reliable conclusion. Beyond these questions, data interpretation also matters. More parameters are needed to comprehensively interpret DNA evidence besides LR itself and corresponding posterior probability in the court.

Despite that several useful tools, such as Familias [13], EasyDNA [14], forrel [9], Bonaparte [15, 16], and Converge Software [17] have been developed, they have not addressed the issues mentioned above satisfactorily. For example, Familias is useful for LR calculation and simulation, but it does not provide solutions for choosing reference relatives. The R package forrel, using a conditional simulation method, is a good tool for prioritizing additional family members for genotyping in missing person cases. However, it is not friendly to laypeople, particularly those unfamiliar with coding. Therefore, we developed a flexible and user-friendly online tool, i.e., Easykin, for forensic kinship analysis and missing person identification. This tool has several promising features. First, it can be used to estimate the system power for a specific set of markers and reference relatives at the consultation and commissioning stage. Importantly, the system power of subsets of available references can also be evaluated, making it easy to choose appropriate references or combinations of them. Second, two mutually exclusive hypotheses can be constructed easily and presented intuitively with just a few mouse clicks. Finally, the user interface (UI) is an HTML-based dashboard, which is friendly to both professional and non-professional users and can be used anytime and anywhere.

Methods

Pedigrees and references

For the purpose of simplicity in pedigree construction, 1st and 2nd degree relatives as well as several genetically unrelated individuals can be chosen. At current version, reference relatives include father/mother, 0–6 children (0–3 sons and 0–3 daughters), paternal grandparent(s), 0–3 paternal uncles, 0–3 paternal aunts, maternal grandparent(s), 0–3 maternal uncles, 0–3 maternal aunts, 0–6 full siblings, 0–6 paternal half siblings, 0–6 maternal half siblings, 0–6 grandchildren (the children of son), 0–6 grandchildren (the children of daughter), 0–6 nephews/nieces (the children of brother), and 0–6 nephews/nieces (the children of sister). Several genetically unrelated individuals can also be included, e.g., spouse, the mother of paternal half sibling, the father of maternal half sibling, daughter-in-law, son-in-law, brother-in-law, and sister-in-law. Theoretically, more than one billion scenarios can be constructed, covering the majority of common cases. For more complex scenarios, say involving incest, users are encouraged to upload their own pedigrees under the instruction in the user guide.

In order to determine the most appropriate reference relative(s), a pruning function is implemented in this tool. During this process, each reference is pruned one by one from the original pedigree, thus generating a series of subsets of them. For example, if three references are available, all possible subsets/pedigrees are POI+S1+S2, POI+S1+S3, POI+S2+S3, POI+S1, POI+S2, and POI+S3 (Fig. 1).

Fig. 1
figure 1

An example of pedigree pruning

Simulation

The alleles of founders (i.e., individuals without parents in the pedigree) are randomly assigned according to the allele frequencies of each locus. All markers are assumed to be unlinked and in Hardy-Weinberg equilibrium and linkage equilibrium. Founders/parents transmit a single allele to his/her offspring with an equal probability. Mutations are also incorporated, with a higher rate for paternal mutation than for maternal mutation (e.g., 3–5 folds). After fully assigning the pedigree, we eliminate the genotypes of samples who are not available in the testing. One hundred pedigrees are simulated under two mutually exclusive hypotheses by default, but the number can be increased if necessary.

LR calculation

Kinship is assessed by comparing two alternative hypotheses: H1: person of interest (POI) is the specific member of the putative pedigree and H2: POI is unrelated to the putative pedigree. The likelihood ratio (LR) is calculated as follows:

$$\textrm{LR}=\frac{P\left(E|H1\right)}{P\left(E|H2\right)}$$

where E represents the DNA evidence, i.e., the joint DNA profiles (e.g., STR) of all tested samples and P(E| H) represents the probabilities of the DNA evidence under each hypothesis (H1 or H2). Likelihoods are calculated using Elston–Stewart (E-S) algorithm [18], which is implemented in the R package Familias [19].

System power estimation

With simulated pedigrees, an empirical log-normal distribution of LRs for H1 and H2 can be obtained. Then, the probability of log10LR at a threshold (t) can be easily estimated using the function pnorm in R. Hypotheses are supported based on the following threshold ranges (t1 and t2; t1 < t2): (i) H1 true: log10LR > t2, (ii) H2 true: log10LR < t1, and (iii) inconclusive: t1 ≤ log10LR ≤ t2. Accordingly, several parameters are calculated for the estimation of system power, including sensitivity (Sen), specificity (Spe), positive predictive value (PPV), negative predictive value (NPV), false positive rate (FPR), false negative rate (FNR), inconclusive, and effectiveness. They are defined as follows:

  • Sen: proportion of pedigrees under H1 judged as H1 true;

  • Spe: proportion of pedigrees under H2 judged as H2 true;

  • PPV: proportion of pedigrees correctly judged as H1 true;

  • NPV: proportion of pedigrees correctly judged as H2 true;

  • FPR: proportion of pedigrees under H2 judged as H1 true;

  • FNR: proportion of pedigrees under H1 judged as H2 true;

  • Inconclusive: proportion of pedigrees that cannot be judged as either H1 true or H2 true;

  • Effectiveness: proportion of pedigrees that can be judged as H1 true or H2 true.

For details on how these metrics are calculated, please refer to Supplementary Table 1.

Effectiveness indicates how many cases will be successfully addressed with defined thresholds. It is a good indicator of overall performance and is classified into four levels, < 0.8 as unsatisfactory, > 0.8 as acceptable, > 0.9 as good, and > 0.99 as perfect.

Implementation

The user interface (UI) of EasyKin is an HTML-based dashboard using shinydashboard (version 0.7.1), which leverages functions from the R package shiny (version 1.5) for the application. Familias is utilized for pedigree construction and LR calculation. Package DT provides an R interface to the JavaScript library DataTables and is used for data presentation. The UI can be accessed from commonly used web browsers (e.g., Google Chrome, Microsoft Edge, Mozilla Firefox, and Apple Safari) and may be utilized from desktop, tablet, or smartphone devices at https://forensicsysu.shinyapps.io/EasyKin/ (a stand-alone version is also available at https://github.com/Ryan620/Easykin). An example of user interface of EasyKin is shown in Fig. 2. All simulations, calculations, and presentations in this tool are performed using R programming.

Fig. 2
figure 2

An example of user interface of EasyKin. There is a sidebar on the left (black background) and a main interface on the right (white background). In the main interface, person of interest (POI) and references are on the left column and pedigrees under H1 and H2, LR distribution, and parameters of system power are on the right column. Users can select one or more references by using the drop-down menus (a paternal grandfather, a paternal uncle, a paternal half-brother, and his mother are included in this example). Pedigrees can be constructed automatically on the top right once reference relatives are specified (shadowed). Then, a histogram of log10(LR) will be plotted after simulation (by clicking the button “OK” on the lower left). Simultaneously, parameters of system power based on user-defined thresholds are shown on the lower right. Users may change the thresholds by just dragging the slider

Results

First, a general workflow is recommended in Fig. 3 for kinship testing and missing person identification using EasyKin. Step 1: Construct two alternative hypotheses with available reference relatives. Step 2: Generate a number of virtual families according to the allele frequencies of STR markers, which are included in the available kits in one lab. An empirical log-normal distribution of LRs for H1 and H2 can be obtained with these simulated pedigrees. Then, by setting appropriate thresholds, parameters of system power, i.e., Sen, Spe, PPV, NPV, FPR, FNR, Inconclusive, and Effectiveness, are estimated. Step 3: Users can now make a decision on which kit(s) and reference relative(s) should be included after balancing FPR and FNR as well as Effectiveness. Considering that there may be cases where many references are available and possibly not all of them are necessary, the pruning function in EasyKin can be used to choose the most informative subsets of them. Step 4: Process sample collection, DNA genotyping, and LR calculation. Step 5: Evaluate the weight of evidence and prepare for data interpretation, including LR itself, posterior probability, a corresponding verbal equivalent, and parameters of system power under the specific LR.

Fig. 3
figure 3

A general workflow for kinship testing and missing person identification using EasyKin

Next, we will present three examples to demonstrate how kinship testing and missing person identification can be improved with EasyKin.

Example 1—pairwise full sibling testing

Pairwise full sibling testing is the second most common type in forensic practice after paternity testing. We assume that one forensic lab has three STR sets, i.e., AmpFlSTR Identifiler (Thermo Fisher Scientific, San Francisco, CA, USA), Huaxia Platinum (Thermo Fisher Scientific, San Francisco, CA, USA), and Microreader 23sp (Suzhou Microread Genetics, Jiangsu, China). The performance of the three kits can be evaluated based on simulation data with EasyKin. According to [12], thresholds of t1 = −2 and t2 = 2 are required. Under these thresholds, none of these kits can individually reach a perfect effectiveness (> 0.99) unless combining them, e.g., Set 4 and Set 5 (Table 1). Therefore, a combination of AmpFlSTR Identifiler + Microreader 23sp or Huaxia Platinum System + Microreader 23sp is suggested for pairwise full sibling testing in this lab. Although it is possible that LR values of individual cases may reach the defined thresholds using a single kit, a sufficient set of markers is still suggested for a lower error rate (Table 1). And vice versa, stricter or higher thresholds are suggested when using low power systems, which may be contrary to our instincts.

Table 1 System power for pairwise full sibling testing using different STR sets. Thresholds: t1 = −2 and t2 = 2

Example 2—personal identification of a unknown body

A man was found dead 20 years ago and his body was cremated after genotyped with AmpFlSTR Identifiler (Thermo Fisher Scientific). For years, his identity remains unknown, until recently a man claims to be his (full) brother. Now, we are commissioned to confirm their relationship.

In this case, marker sets cannot be expanded further due to a lack of DNA of POI. We first evaluated the performance of pairwise full sibling testing with 15 STR loci included in AmpFlSTR Identifiler. As shown in Table 1 (Set 1), the effectiveness was unsatisfactory (0.7866) and the error rate was relatively high, i.e., FPR = 0.0005 and FNR = 0.0014. Therefore, we requested for more reference relatives to participate in the test and were informed that merely an aunt of POI was available. After adding the aunt, the effectiveness increased to 0.9152 and error rate decreased significantly, with FPR < 0.0001 and FNR = 0.0005, indicating that this set of references was able to improve the performance. Then, blood samples of the two references, along with DNA profiles of POI, were sent to our lab and genotyped with Goldeneye 25A (Peoplespot, Beijing, China). Genotypes and LRs are listed in Table 2. The combined LR (CLR) was 119.2377 (log10CLR = 2.0764), exceeding our defined thresholds (t1 = −2 and t2 = 2) and thus supporting their relationship. In addition, we also calculated LR values for POI and his brother, which fell between t1 and t2, thus inconclusive. Given this, pre-estimation with EasyKin is helpful to guide the test and can avoid multiple sampling in forensic caseworks (i.e., collect related samples in one time, not successively).

Table 2 Genotypes and LR values in Example 2. POI: the deceased man; S1: putative aunt of POI; S2: putative full sibling of POI; POI was genotype with AmpFlSTR Identifiler while S1 and S2 were genotyped with Goldeneye 25A, which covers all the markers in AmpFlSTR Identifiler

Example 3—inheritance dispute

In an inheritance dispute case, a boy (POI) claimed to be the child of a deceased man. The mother of POI (known), putative grandparents, a putative paternal half-brother, and his mother were available for the test.

This kind of scene is frequently encountered in practice and we need to determine who should be included before testing. First, we need to evaluate the performance if all these references are genotyped using a certain kit, e.g., Huaxia Platinum System. In this case, we require stricter thresholds with t1 = −4, t2 = 4 and effectiveness > 0.99. Pedigrees under H1 and H2, LR distribution and corresponding system power are shown in Fig. 4. We can anticipate that ~99.89% of cases (effectiveness) will pass the defined thresholds with very low error rates, i.e., FPR < 0.0001 and FNR < 0.0001, indicating a sufficient system power for the testing. In the next step, by pruning the pedigree, we find that the number of references can be reduced without significant decrease in accuracy and effectiveness (Table 3). Four combinations, i.e., POI+S4+S5+S6+S3+S7, POI+S4+S5+S6+S3, POI+S4+S5+S6+S7, and POI+S4+S5+S6, have effectiveness > 0.99. The least number of references is achieved with the combination of POI, his mother, and both putative grandparents (POI+S4+S5+S6). Not surprisingly, POI+S4+S5+S6+S3 and POI+S4+S5+S6 have the same discrimination power given that the two subsets are equivalent. S3 is a singleton in both H1 and H2 and provides no further information about the deceased man unless her son (S7) is also genotyped. If looser thresholds are defined, say t1 = −1, t2 = 1 and effectiveness > 0.99, the number can be reduced further to only two references (both putative grandparents) at the cost of a higher error rate (Supplementary Table 2). Therefore, it may not be necessary to genotype S3 and S7 (as well as S4 at looser thresholds).

Fig. 4
figure 4

Pedigrees under H1 and H2, LR distribution, and system power for Example 3. Markers: 23 STRs in Huaxia Platinum System; thresholds: t1 = −4 and t2 = 4; simulations: n = 500; shadows indicate available references

Table 3 System power for different subsets after pruning the original pedigree in Example 3. Markers: 23 STRs in Huaxia Platinum System; thresholds: t1 = −4 and t2 = 4; simulations: n = 500; sample labeling of POI and S1–S7 corresponds to those in Fig. 4; rows colored gray represent subsets with effectiveness > 0.99; cells with “-” mean NULL outputs as the two hypotheses are equivalent

It is noteworthy that the reduction of references may be problematic in this case if S5 and S6 have more than one child. If POI+S4+S5+S6 or POI+S5+S6 are genotyped, we can only say that POI is the grandson of S5 and S6 (if support), not necessarily to be the child of the deceased man. From this point of view, whether to perform reference pruning depends and varies in real cases.

Discussion

In this paper, we introduced a flexible and user-friendly online tool, named EasyKin, for forensic kinship testing and missing person identification. The three examples demonstrated that if we estimate the system power in advance using EasyKin, appropriate kits and informative references can be easily determined before testing. It may be helpful to avoid multiple sampling and superfluous testing in real cases, thereby reducing time and economic cost.

Although it is possible that LRs of individual cases may reach the defined thresholds with a smaller number of STRs, a sufficient marker system (if available) is always suggested considering a higher accuracy at the same thresholds (Table 1). With regard to references, more relatives generally indicate higher system power, and typing as many of them as possible is encouraged. There are some more considerations to take into account. First, singleton individuals (e.g., spouses) are useless unless other specific relatives are genotyped. For example, S3 cannot provide any more information unless her son S7 is also genotyped in Example 3. Second, distant relatives (third degree or more distant relatives) can only provide limited discrimination capacities and are less recommended with conventional autosomal markers. That is why only 1st and 2nd degree relatives are included by default in the construction of pedigrees in EasyKin. Nevertheless, lineage markers residing on the Y chromosome and mitochondrial DNA (mtDNA) genome can still be used to increase the LR values for these distant kinship analyses [20, 21]. However, it may be challenging to perform additional amplifications in cold cases (Example 2) and forensic investigations of small amounts of DNA. Third, if many references are available, (possibly) not all of them are necessary and the pruning function in EasyKin can be used to choose the most informative subsets of them. Besides the scenario in Example 3 of this study, we also performed the pruning function for pedigree F9 in [22]. We found that the maternal aunt should not have been included as she provided no further increase in effectiveness (Supplementary Table 3).

In addition to genetic markers and references, the threshold also matters. We notice that different labs may have different thresholds to confirm a relationship [10]. Previous works tend to focus only on inclusion and apply a single threshold [4, 23]. At present, double thresholds are widely used in China so that both FPR and FNR can be balanced. As recommended in Specification of parentage testing (GB/T 37223–2018) [24], Technical specification for identification of biological full sibling relationship (SF/T 01172021)[25], and Specification for identification of biological grandparent-grandchild relationship (SF/Z JD01050052015)[26], a relationship is affirmed if LR > 10,000 while it is rejected if LR < 0.0001, otherwise inconclusive. In accordance with these specifications, EasyKin is designed to estimate the system power (Sen, Spe, PPV, NPV, FPR, FNR, Inconclusive, Effectiveness) under either single or double thresholds. However, with the above fixed LR-threshold method[27], t = 10,000 may be too high for cases with low statistical power and will lead to high false negative rates under single threshold and low effectiveness under double thresholds. Therefore, lower thresholds were also applied in some studies [4, 12]. Beside, Marsico et al. proposed a flexible and case-specific LR threshold, named LR decision threshold (DT) [22]. The DT approach allows dealing with underpowered pedigrees and obtaining thresholds with manageable FNR and FPR. The concept is similar to the estimation of optimal cutoff in ROC curves. Although the authors did not intend to provide a LR threshold for reaching a conclusion in the identification process, the DT approach is very instructive on threshold determination.

In real applications, interpretation of evidence is also crucial. In order to improve the communication between forensic scientists and laypeople, EasyKin converts likelihood ratio to verbal equivalents, which are often used to express the strength of evidence in court. According to [28], proposed verbal scales are null support (LR = 1), weak or limited support (LR > 1–10), moderate support (LR > 10–100), strong support (LR > 100–1000), very strong support (LR > 1000–10000), and extremely strong support (LR >10,000). However, we would like to point out that a verbal scale should always be accompanied by a numeric expression of the value of evidence, especially when the value of the evidence is weak/limited. Besides LR itself, system power at a specific LR may also act as good metrics for the interpretation. With EasyKin, users can drag the slider to the proper position (using the single threshold mode) after LR calculation to evaluate the evidence for individual case. If H1 is supported, Sen, PPV, and FPR are useful for data interpretation while Spe, NPV, and FNR can be used if H2 is supported. Take the case in Example 2 as an example. Given the current LR value 119.2377 (log10CLR = 2.0764) as the threshold, Sen, PPV, and FPR are 0.9120, 0.9999, and 0.0001, respectively. Correct rates (PPV and NPV) and error rates (FPR and FNR) may be, to some degree, more straightforward and easier to understand for jurors and lawyers. Therefore, these metrics can also be used for interpreting the value of evidence.

We compared the performance between EasyKin and Familias [2] (desktop application), the latter of which is a popular and free software for kinship analysis. Taking the case in Fig. 4 as an example, we just need about 15 seconds (s) for hypothesis construction with EasyKin, approximately twelve folds faster than Familias (about 3 min). Therefore, fast and intuitive construction of hypotheses is one of the main advantages of EasyKin. With respect to the speed of simulations, EasyKin cost 5.33 s, 52.24 s, 520.00 s for 100, 1000, and 10000 simulations while the runtime was 4.45 s, 39.44 s, and 396.59 s with Familias. Although EasyKin is a little slower, parallel computation may be processed to speed up the simulation with the stand-alone version of EasyKin (https://github.com/Ryan620/Easykin).

We noticed that the runtime of different relationships differed greatly. Thirty-seven common scenarios in forensic casework listed in Ge et al.’s study [1] were simulated and the runtime was compared. Results showed that only several seconds were needed for most scenarios with 100 simulations under “Equal” mutation model using the 23 STRs in AmpFlSTR Huaxia. More time are expected when cousins are included, e.g., 2040.38 s for two cousins (they are also cousins) plus POI (Supplementary Fig. 1). In addition, we found that the runtime increased linearly with both the number of STRs and the number of simulations (Supplementary Fig.2).

There are still some limitations for current version of EasyKin. First, three mutation models are implemented, i.e., “Equal,” “Proportional,” and “Stepwise,” but the calculation under “Equal” model is more efficient and faster. If the “Stepwise” model is specified, LR calculation may be time-consuming for some scenarios. Some common and intuitively simple pairwise relationships still cost several to thousand seconds. In our one test with 100 simulations and 23 STRs in local mode, parent-child, full-sibling, half-sibling, grandparent-grandchild, and avuncular-nephew relationships needed approximately 6 s, 23 s, 7 s, 7 s, and 4000 s, respectively. Therefore, we recommend users choose the “Equal” model for pedigree simulations given the almost identical LR distribution under the three mutation models. Second, the relationships among the references are not validated but EasyKin automatically calculate the LRs for all pairs of references. If any false relationship is found, the individual(s) with false relationship should be removed. Similarly, the true relationship may differ from both of the stated hypotheses and it may introduce bias in the test results [29]. This kind of issue will be studied in our future work. Finally, dependence among markers, especially those on the same chromosomes, should also be considered. If any dependence is found, one of them should be excluded.

Conclusion

EasyKin is a flexible and user-friendly online tool for kinship testing. It provides a one-stop solution for forensic use, that is, instructing users to choose appropriate kits and reference relatives before testing, calculating LR automatically in the testing, and providing metrics for data interpretation after testing. We think it will greatly benefit both forensic and non-forensic practitioners.