Abstract
Validation of forensic methods (FMs) is one of the main procedures of standardization of forensic activities aimed at verifying the reliability of the results. This procedure is extensively used in organizations of the European Network of Forensic Science Institutes (ENFSI), which includes the Russian Federal Centre of Forensic Science of the Ministry of Justice of the Russian Federation (RFCFS). In terms of metrology, FMs can be divided into two types: forensic measurement methods (FMMs) and forensic testing methods (FTMs). In an earlier paper, the authors have shown that methodological approaches to FMM validation are well developed and are actively used in RFCFS laboratories, but FTM validation procedures are still a very questionable matter of extensive discussion in scientific literature. The most significant difficulties in FT validation are related to selecting validation parameters, developing the validation experiment, and performing statistical calculations. This article proposes methodological approaches to statistical assessment of FMM and FTM parameters that can be used in forensic practice. A number of recommendations for the validation procedure, a list of validation parameters, and some designs of specific experiments of FMM and FTM quality assessment are also provided. Fitness of FMMs is assessed by repeatedly measuring a monitored indicator in reference samples and standard additions using standard formulas for calculating statistical parameters. The FTM validation procedure is considered by the example of the FTM “Microscopic Examination of Textile Fibers” in which the test samples were fibers from the laboratory collection with known tested characteristics. It is demonstrated that, when assessing the reliability of FTMs and the competence of experts, it is efficient to use probabilistic estimates of the rate of false test results and to calculate the likelihood ratio.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
INTRODUCTION
In a previous paper, S.A. Smirnova, G.G. Omel’yanyuk, and G.I. Bebeshko [1] presented an algorithm for validation of forensic methods, including quantitative measurement methods. The paper discussed the current general (legal) provisions and described validation parameters and ways of measuring them in detail. Qualitative analysis methods were not considered.
In our view, in terms of metrology, forensic methods can be divided into two types: forensic measurement methods (FMMs) and forensic testing methods (FTMs).
FMMs are sets of operations of quantitative analysis of forensic objects used to solve forensic problems. They involve performing quantitative measurements of certain properties of the examined objects. FTMs are understood as sets of testing operations on forensic objects used to solve forensic problems where the monitored indicator takes a binary value, for example, “presence/absence of a substance” and “match/mismatch of a feature.” These forensic methods generally involve determining a set of features of examined forensic objects and/or qualitative reactions of these objects to specific effects during testing. FMMs and FTMs can be used for solving forensic problems of classification, identification, and diagnosis both separately and together.
The FM validation procedure consists of the following operations:
selecting validation parameters (metrological characteristics and quality indicators of the FM);
designing an experimental study of the validation parameters;
performing the experiment and obtaining results;
calculating and estimating the validation parameters using mathematical statistical methods.
The results of the algorithm are used to formulate conclusions, reach the decision on whether the method fulfills its purpose and function, and prepare the validation protocol.
Practical application of the validation algorithm in RFCFS laboratories has revealed a number of problems and questions. The greatest difficulties, especially for FTMs, arise at the stages of selecting validation parameters, developing the validation experiment, and performing statistical calculations. This article uses specific examples to present methodological approaches to the calculation of key validation parameters—FMM and FTM quality indicators.
FMM VALIDATION PROCEDURE
Validation parameters. On the basis of a generalization of a number of articles and regulatory documents [2–9], the following evaluation parameters were chosen for FMMs:
(1) metrological characteristics or properties of the FMM—specificity; linearity; range of detectable values; limit of detection or quantitation; sensitivity;
(2) quality indicators—precision, trueness, accuracy or uncertainty of analysis results.
The choice of the specific validation parameters depends on the type of method, the scope of its application, and the specific requirements for the forensic examination. If changes concerning the scope, conditions of application, and replacement of measuring instruments, materials, or reagents are made to the standardized method, the changed aspects undergo validation, during which the FMM quality indicators are reevaluated.
The validation experiment must always balance price, risk, and technical capabilities. It needs to meet the customer’s requirements for accuracy and quality of the analysis and make provisions for the risk of false results.
Validation is performed using tested and reliable measurement instruments in strict compliance with requirements specified in the technical documentation.
The necessary scope of experimental research depends on the area of application of the forensic method, availability of information about the examined object, and sufficient availability of necessary reference samples (evaluation samples) with known properties. The main requirement for the experiment is availability of a representative sample of the measured values of the monitored indicator.
The experimental procedure for studying FMM validation parameters in accordance with a predeveloped plan may vary. For the sake of clarity, the procedure is presented the form of schemes (Figs. 1, 2).
Figure 1 depicts the scheme of an FMM validation experiment with reference samples [10].
Because the relevant regulatory documents contained significantly different procedures for the preparation of aqueous extracts of examined soil samples, the FMM was unified: the optimal sample dilution level and extract holding time were specified. As the laboratory did not at the time possess the necessary certified reference samples, reference soil samples were collected from real objects in advance. Reference values of pH and specific electrical conductance (SEC) were determined for the samples.
The experiment involved five operators (L = 5), each with their own set of reagents and equipment. They performed, at different times, six (N = 6) parallel measurements of the tested parameters in three samples (M = 3). Two extracts with different dilutions were analyzed 5 min, 1 h, and 24 h after preparation—a total of six variations of conditions of analysis.
When the unified method was revalidated a year later, it was determined that the average pH and SEC values in the reference soil samples were within the uncertainty interval of the accepted reference values. This confirmed the fitness of the unified FMM for environmental forensic examinations.
Figure 2 depicts the scheme of an FMM validation experiment [11] with standard additions.
The FMM [11] was a modification of the standardized method [12]. The modification concerned the solid-phase extraction stage, at which a different sorbent was used.
The validation evaluated the key characteristics of the modified method (specificity, linearity, limit of detection, range of detectable concentrations) and its quality indicators (precision, accuracy of method, and uncertainty).
To assess precision, three reference soil samples (M = 3) with different contents of benzo[a]pyrene (BaP) were divided between three independent operators (L = 3), who performed three parallel measurements (N = 3) each. Each operator analyzed the samples on different days within three months, using their own set of reagents, equipment, and laboratory glassware.
To assess trueness, three additions of the standard BaP solution were successively added to the reference sample with the lowest BaP content. The sample was analyzed before and after each addition. Each operator repeated the analysis in this sequence three times.
It was determined that the uncertainty of the results of BaP determination in the spiked samples did not exceed the standards of standard deviation of reproducibility of the standardized method. This confirmed the fitness of the FMM for environmental/soil forensic examinations.
Conducting measurements and processing results. In this specific experimental design, the measurement procedure included the following:
preparation of technical instruments and, if necessary, the examined samples;
measurement of the monitored indicators, conducted by experts;
presentation of the experimental data as summary tables containing measurement results;
calculation of the FMM validation parameters.
The formulas used in the calculations in this paper are taken from publications [13–15] and regulatory and technical documents [2–4].
Let us introduce the following notation:
number of evaluated test samples (ES) m = 1,…, M;
number of operators l = 1, …, L;
number of days, if the measurements are taken on different days in conditions of repeatability, k = 1,…, K;
number of parallel measurements in conditions of repeatability n = 1, …, N.
Sequence of Statistical Calculations
1. First, calculate the arithmetic mean xml and the sample variance \(S_{{ml}}^{2}\) of the results of single measurements in the mth ES in conditions of repeatability.
2. To exclude gross errors, or outliers, from the results, perform the Grubbs tests
and compare the results with the critical value of GRtbl for the number of degrees of freedom v, which corresponds to the number of series of measurement results (the number of laboratories or operators). The values of GRtbl are given in the appendix to RMG 61-2010.
If GRml,max > GRtbl or/and GRml,min > GRtbl, then the corresponding results xml,max or xml,min are excluded from further calculations. It is appropriate to exclude no more than two results; if there are more than two, the obtained data needs to be analyzed.
3. To establish the possibility of combining the variances \(S_{{ml}}^{2}\), evaluate their homogeneity using the Cochran test (G) if comparing more than two variances or the Fisher’s exact test (F) for two variances. In the first case, the compared samples have same number of dimensions; in the second case, they have a different numbers of dimensions.
The Cochran test is calculated according to the formula
where m is the sample number and l is the operator number.
The value of Gm(max) is compared to the table value for the number of degrees of freedom v = N – 1, which corresponds to the maximum variance, and to f, which corresponds to the number of summed variances and the accepted confidence coefficient. If at the selected level of 95% probability Gm(max) > Gtbl, then the corresponding value of (\(S_{{ml}}^{2}\))max is excluded from further calculations. It is appropriate to exclude no more than two variances.
The non-excluded sample variances are considered homogeneous, and therefore, the measurement results can be combined into a single set.
The Fisher’s exact test is calculated according to the formula: F = \({{S_{1}^{2}} \mathord{\left/ {\vphantom {{S_{1}^{2}} {S_{2}^{2}}}} \right. \kern-0em} {S_{2}^{2}}}\), where \(S_{1}^{2}\) > \(S_{2}^{2}\). If F < Ftbl, the measurement results can be combined into a single set.
4. Next, using the non-excluded values of xml and Sml, calculate the total average of the measurement results xm and the MSD of the scatter of average results (xml) relative to the total average value—\(S_{m}^{2}\).
5. Calculate the MSD of repeatability of each expert’s results (or the MSD of method repeatability)—Sr,m.
6. Using the values of Sm and Sr,m, calculate the MSD of method reproducibility—SR,m.
7. To assess method trueness, calculate the bias Θ as the difference between the total average value xm and the certified value in the mth ES, and when using standard additions, calculate the difference between the obtained average value in the spiked sample and the certified value for the standard addition for the mth ES.
8. Calculate the residual systematic bias parameter or method trueness indicator—Δc,m.
9. Check the significance of the bias using the Student’s t-test (tm). If the estimated bias is insignificant compared to the random scatter, take Θ = 0. Detection of significant bias may lead to the decision to amend the results of determination during implementation of the method.
10. Evaluate the accuracy of measurement results by calculating the expanded uncertainty. The budget typically includes the uncertainty of random scatter of the results u(xm), which is the same as SR,m, and the bias uncertainty Δc,m. The uncertainties associated with sample preparation (weighing, dilution) and with calibration plots are usually insignificant.
The formulas for calculating FMM quality indicators are given in Table 1.
It should be noted that calculations with Table 1 formulas can be easily performed in Microsoft Excel.
FTM VALIDATION PROCEDURE
General information. Binary response FTMs are widely used in forensic practice and also need to be tested for serviceability and fitness for purpose.
In this case, uncertainty cannot be expressed in the same way as in quantitative analysis, i.e., as a parameter that characterizes the variance of the results or as an acceptable scatter of the predicted value. Instead, uncertainty is typically probabilistic in nature and can be expressed as the probability of making a wrong decision or getting a false result.
Low uncertainty of FTMs or reliability of testing is generally defined by a low level of erroneous results, i.e., a small proportion of false results in the total number of tests.
Problems of validation and questions of metrology and terminology of qualitative methods are actively discussed in the Russian and foreign literature [16–23]. However, detailed standardized methods for assessing reliability of qualitative testing have not yet been developed.
Validation parameters. Only the most critical parameters are evaluated for FTMs: the reliability of the FTM, competence of the expert, and in some cases limit of detection (when the purpose of the analysis is to determine the presence or absence of a particular substance and it is necessary to determine the minimum detectable concentration). The linearity, operating range, and limit of quantification are not evaluated.
In this paper, following the above publications, the reliability of methods and competence of experts were also characterized by proportions of false positive and negative test results.
The validation experiment is performed to confirm reproducibility of the test results. In the experiment, the FTM is repeatedly performed on test samples (TS) with known monitored indicators by several experts at different times. For FTMs that solve classification problems of determining the presence/absence of a component, for example, a specific toxicant, two types of samples are used: a sample in which the content of this toxicant is above the permissible limit and blank samples that by design do not contain the toxicant. For FTMs that establish the presence/absence of a specific set of features, test samples in which the tested features are regulated are used.
Comparing the results of testing performed on test samples by forensic experts with the corresponding known TS indicators makes it possible to determine whether the experts’ results are true or false. Note that mistakes can be found in the analysis of both samples that contain the toxicant and blank samples. There is a distinction between false/true positive results for toxicant-containing samples and false/true negative results for blank samples.
Figure 3 depicts the scheme of the validation experiment for the FTM “Microscopic Examination of Textile Fibers” [24]. The experiment involved four expert operators evaluating a set of external features of 11 different reference fiber samples by conducting, at different times, microscopic studies of each sample with the goal to determine the presence/absence of 100 external features, of which 36 were present and 64 were absent.
In the test results obtained by the experts, a true positive result (TP) meant that the presence of the feature was true determined, a false positive result (FP) meant that the presence of the feature was recorded false. A true negative result (TN) meant that the absence of the feature was true determined, a false negative result (FN) meant that the absence of the feature was recorded false.
A digital table for recording the test results was developed and uploaded to the local network of the laboratory. In the tables of test results obtained by the experts, as well as in the reference tables that contain known TS features, the presence of a feature is indicated by the “+” sign, and the absence is indicated by the “–” sign.
The results of the experiment showed that the probability of false results for the FTM as a whole does not exceed 2.2%, and the probability of false results for each of the experts does not exceed 3.1%. This suggests that the reliability of the method is high and the competence of the forensic experts is sufficient.
Formulas for calculating FTM quality indicators. It is difficult to provide uniform formulas for calculating indicators or proportions of false results. The difficulties are related to the peculiarities of testing: positive and negative results can relate to the same tested feature in different samples (test and blank), or to a set of missing/present independent features in the same sample. In addition, the proportions of false positive/negative results can be calculated in relation to the total number of true positive/negative results or to the total number of all positive/negative results [16].
The indicators (or frequency) of the corresponding test results are essentially probabilistic in nature. The probabilities were calculated using the formulas given in [16, 21].
It is recommended to present the results of validation experiments in the form of summary tables. The results for the considered method are presented in Table 2 [11].
Calculate the indicator (probability) of false results for the presence of features as the ratio of the number of false positive results to the total number of obtained false positive and true positive results: IFP = 3/(132 + 3) × 100 = 2.2%.
Calculate the probability of false negative results for the absence of features in the same way: IFN = 3/(262 + 3) × 100 = 1.1%.
The probability of true results for the presence of features ITP = 132/(132 + 3) × 100 = 97.8%; for the absence of features, ITN = 262/(262 + 3) × 100 = 98.9%.
Table 3 shows the assessment of each expert’s competence calculated from data in Table 2.
The trueness of FTMs can also be assessed using the likelihood ratio, denoted as LR.
Mathematically, LR is the ratio of the probabilities of two complementary events that form a complete group of events, the probability of which is 1. The larger the LR value, the more likely the event in the numerator is compared to the probability of the event in the denominator. If LR = 1, the events are equally likely.
In the above example of FTM validation, let us denote the presence of features (positive results) as event A, the absence of features (negative results) as event B, and the combined presence of some features and absence of others as event AB. The calculation of LRAB—the likelihood ratio for the combination of the presence of some features and absence of others—is of the greatest interest.
The probability of a false result of event AB is equal to the sum of the probabilities IFP and IFN (0.022 + 0.011 = 0.033), since the result of event AB will be false if at least one of the events (A or B) is false. The probability of a true result of event AB is equal to the product of the probabilities ITP and ITN (0.978 × 0.989 = 0.967), since events A and B are independent and occur simultaneously. True and false results for the set of the presence/absence of features are complementary events, so the following expression is valid:
The likelihood ratio for the combination of the presence and absence of features LRAB = (1 – 0.033)/0.033 = 29.3.
The found value of LRAB indicates that the probability of a true result of determining the combination of the presence and absence of features is 29.3 times higher than the probability of a false result, which indicates high reliability of the FTM.
The probabilities of true and false test results obtained by all experts, as well as the obtained LR values, can be used to assess the reliability of the FTM as a whole, and the probabilities of false results obtained by each expert can be used to assess the competence of a particular expert.
CONCLUSIONS
To sum up, this study has attempted to identify and demonstrate methodological approaches to solving the issues of validation of forensic methods related to calculating FMM and FTM quality indicators. It should be noted that methods for evaluating validation parameters have been developed quite thoroughly for FMMs, but evaluation of validation parameters for FTMs requires further development and research.
REFERENCES
Smirnova, S.A., Omel’yanyuk, G.G., and Bebeshko, G.I., Methodological approaches to the validation of forensic methods, including measurement methods (MM), Teor. Prakt. Sud. Ekspert., 2012, no. 1 (25), pp. 50–62.
GOST (State Standard) R ISO 5725-2002: Accuracy (Trueness and Precision) of Measurement Methods and Results, Moscow: Izd. Standartov, 2002, parts 1–6.
RMG-61-2010. Pokazateli tochnosti, pravil’nosti, pretsizionnosti metodiki kolichestvennogo khimicheskogo analiza. Metod otsenki (RMG-61-2010: Accuracy, Correctness, and Precision of the Quantitative Chemical Analysis Procedures, Evaluation Methods), Moscow: Standartinform, 2013.
Quantifying Uncertainty in Analytical Measurement: EURACHEM-CITAC Guide, Ellison, S.L.R., Rosslein, M., and Williams, A., Eds., Teddington: EuraChem, 2000, no. QUAM:2012.P1.
Smirnova, S.A., Omel’yanyuk, G.G., Usov, A.I., and Bebeshko, G.I., Special considerations in applying the key terms and definitions of the international standard GOST ISO/IEC 17025–2009 in forensic science laboratories, Teor. Prakt. Sud. Ekspert., 2012, no. 2 (26), pp. 57–67.
Paneva, V.I., Assessment of the suitability of quantitative analysis methods in the laboratory, Zavod. Lab., Diagn. Mater., 2008, vol. 74, no. 8, pp. 68–72.
Prichard, E. and Barwik, V., Quality Assurance in Analytical Chemistry, Chichester: Wiley, 2007.
Mezhdunarodnyi slovar’ po metrologii (International Vocabulary of Metrology), St. Petersburg: Professional, 2010.
ISO/IEC GUIDE 99:2007: International Vocabulary of Metrology—Basic and General Concepts and Associated Terms (VIM), Geneva: Int. Stand. Org., 2007. http://www.iso.org/standard/45324.html.
Bebeshko, G.I., Omel’yanyuk, G.G., Nikulina, M.V., and Valitova, A.R., A practice of validation of method of determination of pH and specific electrical conductivity in the objects of soil-geological origin for production of forensic environmental examination in the absence of standard samples, Teor. Prakt. Sud. Ekspert., 2017, vol. 12, no. 2, pp. 66–74.
Smirnova, S.A., Omel’yanyuk, G.G., Bebeshko, G.I., and Yudin, N.V., The experience of validation of measurement method “The determination of benzo(a)pyrene concentration in the objects of soil-geological origin by means of HPLC fluorimetry detecting method” for production of forensic environmental examination, Teor. Prakt. Sud. Ekspert., 2012, no. 3 (27), pp. 79–91.
MUK 4.1.1274-03. Metody kontrolya. Khimicheskie faktory. Izmerenie massovoi doli benz(a)pirena v probakh pochv, gruntov, donnykh otlozhenii i tverdykh otkhodov metodom VEZhKh s ispol’zovaniem fluorimetricheskogo detektora (MUK 4.1.1274-03. Test Methods. Chemical Factors. Measurement of the Mass Fraction of Benz(a)pyrene in Soil, Sediments, and Solid Waste Samples by HPLC using a Fluorometric Detector), Moscow: Minist. Zdravookhr. Ross., 2003.
Doerffel, K., Analytical science—a discipline between chemistry and metrology, Fresenius J. Anal. Chem., 1998, vol. 363, no. 5, pp. 393–394.
Doerffel, K., Statistik in Der Analytischen Chemie, Leipzig: Dtsch. Verlag Grundstoffind., 1966.
Gauthier, T.D., Statistical methods, in Introduction to Environmental Forensics, Murphy, B.L. and Morrison, R.D., Eds., London: Elsevier, 2004, ch. 10, pp. 391–428.
The Expression of Uncertainty in Qualitative Testing: EUACHEM/CITAC Guide, Teddington: EuraChem, 2003, no. LGCN/ VAM/2003/048/.
Pulido, A., Ruisánchez, I., Boqueì, R., and Rius, F.X., Uncertainty of results in routine qualitative analysis, TrAC, Trends Anal. Chem., 2003, vol. 22, no. 10, pp. 647–654. https://doi.org/10.1016/S0165-9936(03)01104-X
Ellison, S.L.R. and Fearn, T., Characterizing the performance of qualitative analytical methods: Statistics and terminology, TrAC, Trends Anal. Chem., 2005, vol. 24, no. 6, pp. 468–476. https://doi.org/10.1016/j.trac.2005.03.007
Trullols, E., Ruisaìnchez, I., Rius, F.X., and Huguet, J., Validation of qualitative methods of analysis that use control samples, TrAC, Trends Anal. Chem., 2004, vol. 23, no. 2, pp. 137–145. https://doi.org/10.1016/j.trac.2005.04.001
Panteleimonov, A.V., Nikitina, N.A., Reshetnyak, E.A., et al., Binary response procedures of qualitative analysis: methodological characteristics and calculation aspects, Metody Ob’ekty Khim. Anal., 2008, vol. 3, no. 2, pp. 128–146.
Mil’man, B.L., Introduction to forensic identification. – St. Petersburg: VVM, 2008.
Mil’man, B.L. and Konopel’ko, L.A., Uncertainty of qualitative chemical analysis: General methodology and binary test methods, J. Anal. Chem., 2004, vol. 59, no. 12, pp. 1128–1141. https://doi.org/10.1023/B:SANC.0000049712.88066.e7
Mil’man, B.L., Identification of chemical compounds, TrAC, Trends Anal. Chem., 2005, vol. 24, no. 6, pp. 493–508. https://doi.org/10.1016/j.trac.2005.03.013.https://doi.org/10.1016/j.trac.2005.03.013
Smirnova, S.A., Bebeshko, G.I., Lyubetskaya, I.P., et al., Probability-based validation of the forensic method “Microscopic analysis of textile fibers,” Teor. Prakt. Sud. Ekspert., 2019, vol. 14, no. 2, pp. 92–99. https://doi.org/10.30764/1819-2785-2019-14-2-92-99
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by A. Ovchinnikova
Rights and permissions
About this article
Cite this article
Bebeshko, G.I., Lyubetskaya, I.P., Omel’yanyuk, G.G. et al. Methodological Approaches to Calculating Key Validation Parameters of Forensic Methods. Inorg Mater 57, 1385–1392 (2021). https://doi.org/10.1134/S0020168521140028
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0020168521140028