INTRODUCTION

In a previous paper, S.A. Smirnova, G.G. Omel’yanyuk, and G.I. Bebeshko [1] presented an algorithm for validation of forensic methods, including quantitative measurement methods. The paper discussed the current general (legal) provisions and described validation parameters and ways of measuring them in detail. Qualitative analysis methods were not considered.

In our view, in terms of metrology, forensic methods can be divided into two types: forensic measurement methods (FMMs) and forensic testing methods (FTMs).

FMMs are sets of operations of quantitative analysis of forensic objects used to solve forensic problems. They involve performing quantitative measurements of certain properties of the examined objects. FTMs are understood as sets of testing operations on forensic objects used to solve forensic problems where the monitored indicator takes a binary value, for example, “presence/absence of a substance” and “match/mismatch of a feature.” These forensic methods generally involve determining a set of features of examined forensic objects and/or qualitative reactions of these objects to specific effects during testing. FMMs and FTMs can be used for solving forensic problems of classification, identification, and diagnosis both separately and together.

The FM validation procedure consists of the following operations:

selecting validation parameters (metrological characteristics and quality indicators of the FM);

designing an experimental study of the validation parameters;

performing the experiment and obtaining results;

calculating and estimating the validation parameters using mathematical statistical methods.

The results of the algorithm are used to formulate conclusions, reach the decision on whether the method fulfills its purpose and function, and prepare the validation protocol.

Practical application of the validation algorithm in RFCFS laboratories has revealed a number of problems and questions. The greatest difficulties, especially for FTMs, arise at the stages of selecting validation parameters, developing the validation experiment, and performing statistical calculations. This article uses specific examples to present methodological approaches to the calculation of key validation parameters—FMM and FTM quality indicators.

FMM VALIDATION PROCEDURE

Validation parameters. On the basis of a generalization of a number of articles and regulatory documents [29], the following evaluation parameters were chosen for FMMs:

(1) metrological characteristics or properties of the FMM—specificity; linearity; range of detectable values; limit of detection or quantitation; sensitivity;

(2) quality indicators—precision, trueness, accuracy or uncertainty of analysis results.

The choice of the specific validation parameters depends on the type of method, the scope of its application, and the specific requirements for the forensic examination. If changes concerning the scope, conditions of application, and replacement of measuring instruments, materials, or reagents are made to the standardized method, the changed aspects undergo validation, during which the FMM quality indicators are reevaluated.

The validation experiment must always balance price, risk, and technical capabilities. It needs to meet the customer’s requirements for accuracy and quality of the analysis and make provisions for the risk of false results.

Validation is performed using tested and reliable measurement instruments in strict compliance with requirements specified in the technical documentation.

The necessary scope of experimental research depends on the area of application of the forensic method, availability of information about the examined object, and sufficient availability of necessary reference samples (evaluation samples) with known properties. The main requirement for the experiment is availability of a representative sample of the measured values of the monitored indicator.

The experimental procedure for studying FMM validation parameters in accordance with a predeveloped plan may vary. For the sake of clarity, the procedure is presented the form of schemes (Figs. 1, 2).

Fig. 1.
figure 1

Scheme of the validation experiment for the FMM “Determination of pH and Specific Electrical Conductance in Soil Samples for the Purposes of Forensic Environmental Examination” [10].

Fig. 2.
figure 2

Scheme of the validation experiment for the FMT “Determination of Benzo[a]pyrene in Soil Samples Using HPLC with Fluorometric Detection for the Purposes of Forensic Environmental Examination” [11].

Figure 1 depicts the scheme of an FMM validation experiment with reference samples [10].

Because the relevant regulatory documents contained significantly different procedures for the preparation of aqueous extracts of examined soil samples, the FMM was unified: the optimal sample dilution level and extract holding time were specified. As the laboratory did not at the time possess the necessary certified reference samples, reference soil samples were collected from real objects in advance. Reference values of pH and specific electrical conductance (SEC) were determined for the samples.

The experiment involved five operators (L = 5), each with their own set of reagents and equipment. They performed, at different times, six (N = 6) parallel measurements of the tested parameters in three samples (M = 3). Two extracts with different dilutions were analyzed 5 min, 1 h, and 24 h after preparation—a total of six variations of conditions of analysis.

When the unified method was revalidated a year later, it was determined that the average pH and SEC values in the reference soil samples were within the uncertainty interval of the accepted reference values. This confirmed the fitness of the unified FMM for environmental forensic examinations.

Figure 2 depicts the scheme of an FMM validation experiment [11] with standard additions.

The FMM [11] was a modification of the standardized method [12]. The modification concerned the solid-phase extraction stage, at which a different sorbent was used.

The validation evaluated the key characteristics of the modified method (specificity, linearity, limit of detection, range of detectable concentrations) and its quality indicators (precision, accuracy of method, and uncertainty).

To assess precision, three reference soil samples (M = 3) with different contents of benzo[a]pyrene (BaP) were divided between three independent operators (L = 3), who performed three parallel measurements (N = 3) each. Each operator analyzed the samples on different days within three months, using their own set of reagents, equipment, and laboratory glassware.

To assess trueness, three additions of the standard BaP solution were successively added to the reference sample with the lowest BaP content. The sample was analyzed before and after each addition. Each operator repeated the analysis in this sequence three times.

It was determined that the uncertainty of the results of BaP determination in the spiked samples did not exceed the standards of standard deviation of reproducibility of the standardized method. This confirmed the fitness of the FMM for environmental/soil forensic examinations.

Conducting measurements and processing results. In this specific experimental design, the measurement procedure included the following:

preparation of technical instruments and, if necessary, the examined samples;

measurement of the monitored indicators, conducted by experts;

presentation of the experimental data as summary tables containing measurement results;

calculation of the FMM validation parameters.

The formulas used in the calculations in this paper are taken from publications [1315] and regulatory and technical documents [24].

Let us introduce the following notation:

number of evaluated test samples (ES) m = 1,…, M;

number of operators l = 1, …, L;

number of days, if the measurements are taken on different days in conditions of repeatability, k = 1,…, K;

number of parallel measurements in conditions of repeatability n = 1, …, N.

Sequence of Statistical Calculations

1. First, calculate the arithmetic mean xml and the sample variance \(S_{{ml}}^{2}\) of the results of single measurements in the mth ES in conditions of repeatability.

2. To exclude gross errors, or outliers, from the results, perform the Grubbs tests

$$G{{R}_{{ml,\max }}} = \frac{{{{x}_{{ml,\max }}} - {{x}_{{ml}}}}}{{{{S}_{{ml}}}}}\,\,\,{\text{and}}\,\,\,G{{R}_{{ml,\min }}} = \frac{{{{x}_{{ml}}} - {{x}_{{ml,\min }}}}}{{{{S}_{{ml}}}}},$$

and compare the results with the critical value of GRtbl for the number of degrees of freedom v, which corresponds to the number of series of measurement results (the number of laboratories or operators). The values of GRtbl are given in the appendix to RMG 61-2010.

If GRml,max > GRtbl or/and GRml,min > GRtbl, then the corresponding results xml,max or xml,min are excluded from further calculations. It is appropriate to exclude no more than two results; if there are more than two, the obtained data needs to be analyzed.

3. To establish the possibility of combining the variances \(S_{{ml}}^{2}\), evaluate their homogeneity using the Cochran test (G) if comparing more than two variances or the Fisher’s exact test (F) for two variances. In the first case, the compared samples have same number of dimensions; in the second case, they have a different numbers of dimensions.

The Cochran test is calculated according to the formula

$${{G}_{{m(\max )}}} = \frac{{{{{(S_{{m,l}}^{2})}}_{{\max }}}}}{{\sum\limits_{l = 1}^L {S_{{m,l}}^{2}} }},$$

where m is the sample number and l is the operator number.

The value of Gm(max) is compared to the table value for the number of degrees of freedom v = N – 1, which corresponds to the maximum variance, and to f, which corresponds to the number of summed variances and the accepted confidence coefficient. If at the selected level of 95% probability Gm(max) > Gtbl, then the corresponding value of (\(S_{{ml}}^{2}\))max is excluded from further calculations. It is appropriate to exclude no more than two variances.

The non-excluded sample variances are considered homogeneous, and therefore, the measurement results can be combined into a single set.

The Fisher’s exact test is calculated according to the formula: F = \({{S_{1}^{2}} \mathord{\left/ {\vphantom {{S_{1}^{2}} {S_{2}^{2}}}} \right. \kern-0em} {S_{2}^{2}}}\), where \(S_{1}^{2}\) > \(S_{2}^{2}\). If F < Ftbl, the measurement results can be combined into a single set.

4. Next, using the non-excluded values of xml and Sml, calculate the total average of the measurement results xm and the MSD of the scatter of average results (xml) relative to the total average value—\(S_{m}^{2}\).

5. Calculate the MSD of repeatability of each expert’s results (or the MSD of method repeatability)—Sr,m.

6. Using the values of Sm and Sr,m, calculate the MSD of method reproducibilitySR,m.

7. To assess method trueness, calculate the bias Θ as the difference between the total average value xm and the certified value in the mth ES, and when using standard additions, calculate the difference between the obtained average value in the spiked sample and the certified value for the standard addition for the mth ES.

8. Calculate the residual systematic bias parameter or method trueness indicator—Δc,m.

9. Check the significance of the bias using the Student’s t-test (tm). If the estimated bias is insignificant compared to the random scatter, take Θ = 0. Detection of significant bias may lead to the decision to amend the results of determination during implementation of the method.

10. Evaluate the accuracy of measurement results by calculating the expanded uncertainty. The budget typically includes the uncertainty of random scatter of the results u(xm), which is the same as SR,m, and the bias uncertainty Δc,m. The uncertainties associated with sample preparation (weighing, dilution) and with calibration plots are usually insignificant.

The formulas for calculating FMM quality indicators are given in Table 1.

Table 1.   Items, designations, and formulas for calculation of the statistical parameters

It should be noted that calculations with Table 1 formulas can be easily performed in Microsoft Excel.

FTM VALIDATION PROCEDURE

General information. Binary response FTMs are widely used in forensic practice and also need to be tested for serviceability and fitness for purpose.

In this case, uncertainty cannot be expressed in the same way as in quantitative analysis, i.e., as a parameter that characterizes the variance of the results or as an acceptable scatter of the predicted value. Instead, uncertainty is typically probabilistic in nature and can be expressed as the probability of making a wrong decision or getting a false result.

Low uncertainty of FTMs or reliability of testing is generally defined by a low level of erroneous results, i.e., a small proportion of false results in the total number of tests.

Problems of validation and questions of metrology and terminology of qualitative methods are actively discussed in the Russian and foreign literature [1623]. However, detailed standardized methods for assessing reliability of qualitative testing have not yet been developed.

Validation parameters. Only the most critical parameters are evaluated for FTMs: the reliability of the FTM, competence of the expert, and in some cases limit of detection (when the purpose of the analysis is to determine the presence or absence of a particular substance and it is necessary to determine the minimum detectable concentration). The linearity, operating range, and limit of quantification are not evaluated.

In this paper, following the above publications, the reliability of methods and competence of experts were also characterized by proportions of false positive and negative test results.

The validation experiment is performed to confirm reproducibility of the test results. In the experiment, the FTM is repeatedly performed on test samples (TS) with known monitored indicators by several experts at different times. For FTMs that solve classification problems of determining the presence/absence of a component, for example, a specific toxicant, two types of samples are used: a sample in which the content of this toxicant is above the permissible limit and blank samples that by design do not contain the toxicant. For FTMs that establish the presence/absence of a specific set of features, test samples in which the tested features are regulated are used.

Comparing the results of testing performed on test samples by forensic experts with the corresponding known TS indicators makes it possible to determine whether the experts’ results are true or false. Note that mistakes can be found in the analysis of both samples that contain the toxicant and blank samples. There is a distinction between false/true positive results for toxicant-containing samples and false/true negative results for blank samples.

Figure 3 depicts the scheme of the validation experiment for the FTM “Microscopic Examination of Textile Fibers” [24]. The experiment involved four expert operators evaluating a set of external features of 11 different reference fiber samples by conducting, at different times, microscopic studies of each sample with the goal to determine the presence/absence of 100 external features, of which 36 were present and 64 were absent.

Fig. 3.
figure 3

Scheme of the validation experiment for the FTM “Microscopic Examination of Textile Fibers” [24].

In the test results obtained by the experts, a true positive result (TP) meant that the presence of the feature was true determined, a false positive result (FP) meant that the presence of the feature was recorded false. A true negative result (TN) meant that the absence of the feature was true determined, a false negative result (FN) meant that the absence of the feature was recorded false.

A digital table for recording the test results was developed and uploaded to the local network of the laboratory. In the tables of test results obtained by the experts, as well as in the reference tables that contain known TS features, the presence of a feature is indicated by the “+” sign, and the absence is indicated by the “–” sign.

The results of the experiment showed that the probability of false results for the FTM as a whole does not exceed 2.2%, and the probability of false results for each of the experts does not exceed 3.1%. This suggests that the reliability of the method is high and the competence of the forensic experts is sufficient.

Formulas for calculating FTM quality indicators. It is difficult to provide uniform formulas for calculating indicators or proportions of false results. The difficulties are related to the peculiarities of testing: positive and negative results can relate to the same tested feature in different samples (test and blank), or to a set of missing/present independent features in the same sample. In addition, the proportions of false positive/negative results can be calculated in relation to the total number of true positive/negative results or to the total number of all positive/negative results [16].

The indicators (or frequency) of the corresponding test results are essentially probabilistic in nature. The probabilities were calculated using the formulas given in [16, 21].

It is recommended to present the results of validation experiments in the form of summary tables. The results for the considered method are presented in Table 2 [11].

Table 2.   Results of the validation experiment for the FTM “Microscopic Examination of Textile Fibers” (total number of features in all test samples N = 100)

Calculate the indicator (probability) of false results for the presence of features as the ratio of the number of false positive results to the total number of obtained false positive and true positive results: IFP = 3/(132 + 3) × 100 = 2.2%.

Calculate the probability of false negative results for the absence of features in the same way: IFN = 3/(262 + 3) × 100 = 1.1%.

The probability of true results for the presence of features ITP = 132/(132 + 3) × 100 = 97.8%; for the absence of features, ITN = 262/(262 + 3) × 100 = 98.9%.

Table 3 shows the assessment of each expert’s competence calculated from data in Table 2.

Table 3.   Probability of false results for each expert

The trueness of FTMs can also be assessed using the likelihood ratio, denoted as LR.

Mathematically, LR is the ratio of the probabilities of two complementary events that form a complete group of events, the probability of which is 1. The larger the LR value, the more likely the event in the numerator is compared to the probability of the event in the denominator. If LR = 1, the events are equally likely.

In the above example of FTM validation, let us denote the presence of features (positive results) as event A, the absence of features (negative results) as event B, and the combined presence of some features and absence of others as event AB. The calculation of LRAB—the likelihood ratio for the combination of the presence of some features and absence of others—is of the greatest interest.

The probability of a false result of event AB is equal to the sum of the probabilities IFP and IFN (0.022 + 0.011 = 0.033), since the result of event AB will be false if at least one of the events (A or B) is false. The probability of a true result of event AB is equal to the product of the probabilities ITP and ITN (0.978 × 0.989 = 0.967), since events A and B are independent and occur simultaneously. True and false results for the set of the presence/absence of features are complementary events, so the following expression is valid:

$${\text{ITP}}{\kern 1pt} \cdot {\kern 1pt} {\text{ITN}} = 1 - \left( {{\text{IFP}} + {\text{IFN}}} \right).$$

The likelihood ratio for the combination of the presence and absence of features LRAB = (1 – 0.033)/0.033 = 29.3.

The found value of LRAB indicates that the probability of a true result of determining the combination of the presence and absence of features is 29.3 times higher than the probability of a false result, which indicates high reliability of the FTM.

The probabilities of true and false test results obtained by all experts, as well as the obtained LR values, can be used to assess the reliability of the FTM as a whole, and the probabilities of false results obtained by each expert can be used to assess the competence of a particular expert.

CONCLUSIONS

To sum up, this study has attempted to identify and demonstrate methodological approaches to solving the issues of validation of forensic methods related to calculating FMM and FTM quality indicators. It should be noted that methods for evaluating validation parameters have been developed quite thoroughly for FMMs, but evaluation of validation parameters for FTMs requires further development and research.