Introduction

There is a continuous and increasing need for reliable analytical methods to assess compliance with national and international requirements in all areas of analysis (Thompson and Wood 1993, 1995; Horwitz 1995; Thompson et al. 1999). The reliability of a method is determined by some form of validation, i.e., the procedure providing evidence of suitability of an analytical method for its intended purpose (Balls and Fentem 1997; Green 1996). Based on the results of a validation study, a method is considered or not as fit for the intended purposes.

In most cases, formal validation requires the assessment of the performance of the proposed method by means of an interlaboratory study (but alternatives are possible; Thompson et al. 2002), also known as collaborative study or ring trial. Many national and international protocols defining criteria for the organization of these collaborative studies and interpretation of their results are available and routinely adopted. The International Standard Organization (ISO 5725 1994) and the International Union for Pure and Applied Chemistry (IUPAC 1988, 1995) provide comprehensive standards describing statistical procedures to assess analytical methods performance. The World Health Organization (WHO 1980) provides a detailed description regarding acceptance/rejection criteria for immunoassay kits and other protein-binding systems. The Association of Official Analytical Chemists International (AOAC International, http://www.aoac.org) is a recognized worldwide authority on method validation and quality analytical measurements. In the area of food quality and safety, the Codex Alimentarius Commission (FAO–WHO 2005) requires the availability of specific performance information to include a method in the Codex commodity standard. Guidelines for the evaluation of methods of analysis of genetically modified organisms are made available by the European Network of GMO Laboratories (ENGL, through the website of the Community Reference Laboratory for GM Food and Feed of the European Commission, http://gmo-crl.jrc.it). In pharmacopoeia, the International Conference on Harmonisation (ICH 1995, 1997) and the US Pharmacopoeia (USP) documents (USP 2003) are basic guidance for validation studies.

According to available internationally standardized procedures, of which the examples mentioned above are only an extract, formal validation studies should provide detailed information regarding the conditions of method’s applicability and estimates for a series of validation measures, both necessary to assess overall method performance.

The objectives of this paper are first to provide a critical review of the interpretive results coming out of analytical methods validation studies and, second, to propose a new procedure to summarize the information provided by individual validation indices and test statistics into comprehensive indicators of method performance. Through the application of fuzzy logic (Hall and Kandel 1991), we propose aggregated indicators as suitable tools for global evaluation of analytical methods, allowing also objective comparison across different methods.

Analytical Method Validation: Background and Prospective Issues

Test Statistics and Numerical Indices

In agreement with international guidelines and protocols (ISO 5725 1994; IUPAC 1988, 1995; WHO 1980; FAO–WHO 2005; ICH 1995, 1997; USP 2003; ISO 2007), a collection of test statistics (values derived from sample information with an associated probability, like the Student’s t or Fisher’s F statistic) and numerical indices (quantities that represent some properties of sampled data, e.g., departure from expected values), from hereafter “validation metrics,” are routinely used to assess methods’ performance. Data check is commonly used to verify features such as normality of distributions, homogeneity of variances, and presence of anomalous data (stragglers, outliers). Some statistical tests (e.g., analysis of variance, linear regression analysis) are normally applied when one or more factors (or one or more regressors) are expected to affect the analytical response. Instead, validation indices are used to provide information regarding a series of features that are internationally recognized as crucial for the overall evaluation of any analytical method performance, e.g., trueness, precision, specificity, detection limit, quantification limit, linearity, and range. A brief description of the most relevant ones follows, provided that an abundance of detailed documentation exists (ISO 5725 1994; WHO 1980; ISO 2007).

Trueness is defined as the departure of the average result of a method from a reference value. Precision is defined as the closeness of agreement between independent results obtained under stipulated conditions (i.e., precision) that is also evaluated and expressed in terms of standard deviation (or derived measures). Depending upon which factors (laboratory, analyst, instrument, time frame) are considered as possible sources of variation, three types of precision can be estimated: repeatability (all the factors are held constant), which provides information on the minimum variation intrinsic in a method, intermediate precision (one or more of the factors analyst, instrument, or time frame are kept not constant), and reproducibility (the factor laboratory is also not kept constant), which provides information on the maximum variation that can be expected for a method. An assessment of methods by analyzing quantitatively the effects of deliberate and known variations in the operating conditions is known as robustness (or ruggedness). Quantification limit is defined as the smallest measured content above which a determination of the analyte is possible. In general, the limit of quantification of a method is associated with its limit of detection: the lowest concentration of analyte that can be detected but not necessarily quantified, by the analytical method. Linearity is defined as a method that is required to elicit results that are directly equal to or, by some mathematical transformation, proportional to the analyte concentration within a given range of application of the method. This is assessed via statistical analysis on correlation coefficient, y-intercept, slope, mean square error, and lack of fit. Finally, specificity is the capacity of a method to respond exclusively to the analyte of interest.

Limitations of the Current Approach

A full validation study requires an extensive collaborative study to obtain the necessary data to assess the performance of a method and its transferability among laboratories. However, according to Muire-Sluis (2004), analysts often point out that “validated methods may not be valid.” The question often arises: What exactly makes a validated method valid? According to the Center for Biologics Evaluation and Research (CBER 2000), “the acceptability of analytical data corresponds directly to the criteria used to validate the method.” This is a challenging issue because once a method is validated and estimates of the various validation metrics are available, it is then up to the analyst to define an acceptance criterion on the basis of prior knowledge of the measurements as well as its intended application. For example, if an acceptance criterion for precision is a repeatability relative standard deviation (RSDr) of 30%, all the methods showing RSDr lower than or equal to this threshold are acceptable. The acceptance criterion of a given validation statistic may not be easily defined by one threshold only, whereas two thresholds delimiting a “fuzzy” area may be more suitable. As an example, the first threshold for variability measures such as repeatability (within laboratory) or reproducibility (among laboratories) is the upper limit beyond which the method response is considered unacceptable; the second threshold is the lower limit below which the method is unquestionably good. Given the variability inherent to most laboratory instrument systems, the question whether a validation statistic is “good” or “bad” can be difficult to answer. In some cases, intuition, experience, and knowledge of the practical context of the analytical data can be used to inspect or “eyeball” the data to assess a validation response. Most analysts would agree that very small or very large values of a validation metric determine a clear acceptance/rejection judgment. This issue was exemplified as follows (Limentani et al. 2005): A difference of 100% in a measurement typically exhibiting a precision of 1% is a real difference, and a difference of 0.01% is irrelevant for the same measurement. However, less clear-cut cases, which are very common in real-life situations, may be much more problematic for objective judgment and decision making. Another difficulty is linked to the fact that in some instances, a method’s reliability is assessed by means of one or a few metrics, whether in other cases a more in-depth validation study may be required. When multiple metrics are used to assess methods’ performance (which is the most common case), organizing the data, processing the results, and using them for an overall judgment might not be a trivial issue. In summary, validation studies as currently carried out have an intrinsic subjectivity linked to the interpretation of the various metrics used, which are difficult to manage unless specific procedures improving objectiveness are introduced and adopted.

Fuzzy-based Expert Systems: A Novel Approach

Each feature assessed via method validation (e.g., trueness, repeatability, and reproducibility) allows only a partial insight into the assessment of the overall method performance. To provide a solid and comprehensive quality judgment of an analytical method, the simultaneous joint evaluation of all the metrics is necessary. This is especially relevant when either some metrics are contradictory, that is, when the performance of a method is satisfactory with respect to some features but not with respect to others. In this respect, combining multiple metrics into aggregated measures is useful to achieve a comprehensive assessment of the method’s response, to evaluate its performance under a variety of conditions (robustness/ruggedness testing), and to select the best method among several available with respect to specific needs.

Despite the potential advantages, up to date, there is no clear objective strategy to combine results from different metrics without being strongly affected by subjectivity. Indeed, personal preferences and specific needs may influence the interpretation of validation studies’ results. A strategy to improve objectivity is to weigh the different metrics according to the intended use of the analytical procedure and to summarize the information via aggregation of the weighted metrics. However, the strategy to aggregate them cannot merely rely upon summation, multiplication, or a combination of both (Keeney and Raiffa 1993). An effective approach to aggregate basic statistics can rely, instead, in setting up a fuzzy expert system using decision rules (Hall and Kandel 1991). This technique is robust on uncertain and imprecise data such as subjective judgments and allows the aggregation of dissimilar metrics in a consistent and reproducible way (Bouchon-Meunier 1995). For these reasons, here, we propose to summarize all the information collected during the validation study (independent metrics) via fuzzy logic, following a fully described methodology (Bellocchi et al. 2002). Fuzzy logic is derived from fuzzy group theory, dealing with reasoning that is approximate rather than precisely deduced from classical or “crisp” logic (Klir et al. 1997). Fuzzy logic uses predicates such as “good”–“bad,” “high”–“low,” or similar. Attaching a quality judgment to analytical methods, such as “accurate’–“inaccurate” or “precise”–“imprecise,” follows as a consequence of validation (Holden et al. 2005). According to “crisp” group assumption, an element can only be in a group or not: If, for instance, a subgroup consists of the methods with at least 30% of precision (maximum acceptable standard deviation over the mean), a particular method can be classified as a member or not of the subgroup. If, however, a method is defined to be the subgroup of “precise” methods, then it is more difficult to determine if a specific method is in the subgroup. If one decides that only methods with a minimum precision of RSDr = 30% are in the subgroup, then a method with RSDr = 30.1% cannot easily be classified as “imprecise” even though it is “almost imprecise.” The use of the fuzzy group theory is compelling because available threshold values for precision and other relevant validation metrics are often vague and/or uncertain. Thus, classification based on an abrupt transition between groups is doubtful. The fuzzy group theory addresses this type of problem by allowing one to define the “degree of membership” of an element in a group by means of a membership function. For “crisp” groups, the membership function only takes two values: 0 (nonmembership) and 1 (membership). In fuzzy groups, the membership function can take any value from the interval [0, 1]. The value 0 represents complete nonmembership, the value 1 represents complete membership, and values in between (transition interval) are used to represent partial membership.

Materials and Methods

For this study, the data from the validation carried out on two analytical methods for genetic modification (GM) detection, TC1507 (Mazzara et al. 2005) and DAS59122 (Mazzara et al. 2006; simplified as 59122 hereafter), were used. The validation was performed by an interlaboratory study under coordination of the European Commission Community Reference Laboratory for Genetically Modified Food and Feed (CRL-GMFF, Ispra, Italy). These studies complied with European requirements for authorization of GM food or feed product (Regulation (EC) no. 1829/2003 2003 and Regulation (EC) no. 641/2004 2004), and the outcome of the validation trials fulfilled the method performance requirements established by the ENGL. Therefore, the two event-specific quantitative detection methods were declared fit for the purpose of regulatory compliance (http://gmo-crl.jrc.it/summaries/TC1507-report_mm.pdf, http://gmo-crl.jrc.it/summaries/59122_val_report.pdf). A fuzzy logic-based indicator was developed for the purpose of evaluating both methods when used to quantify different concentrations (levels) of the GM analyte. For the fuzzy indicator, the values computed on each level were averaged to attach a synthetic judgment to the method results.

Experimental Design

Participating laboratories were provided with the method, materials, and reagents necessary to perform the validation study of the event-specific method for the quantification of GM maize line TC1507 (Mazzara et al. 2005) and of the event-specific method for the quantification of GM maize line 59122 (Mazzara et al. 2006).

The deoxyribonucleic acid (DNA) control sample submitted by the applicant and method developer (Pioneer Hi-bred International) at 100% GM (TC1507 and 59122) and DNA from a conventional variety were mixed to prepare standard and blind samples at different GM percentages (from 0% to 5%; Mazzara et al. 2005, 2006).

For the validation of the event-specific method for the quantification of TC1507, 12 unknown samples, representing six GM levels, were used in the validation study. On each polymerase chain reaction (PCR) plate, six samples were analyzed in parallel with both the TC1507- and hmg-specific system (Krech 1999). Two plates in total were run, with two replicates for each GM level analyzed on the same run. Each sample was analyzed in three PCR repetitions, i.e., loaded in three parallel plate wells.

For the validation of the event-specific method for the quantification of 59122, 20 unknown samples, representing five GM levels, were used in the validation study. On each PCR plate, ten samples were analyzed in parallel with both the 59122- and hmg-specific system (Krech 1999). Two plates in total were run, with two replicates for each GM level analyzed on each run. Each sample was analyzed in triplicate.

The quantification of unknown samples was carried out by quantifying the GM event relatively to the total maize quantity by using separate calibration curves analyzed in the same PCR run, i.e., one calibration curve for the GM event (TC1507 or 59122, respectively) and one calibration curve for total maize DNA. The relative amount of GM event DNA in total maize DNA of blind samples was obtained by dividing the GM copy number (TC1507 or 59122, respectively) by the maize copy number for each blind sample. The copy number measured for blind samples were obtained by interpolation from the calibration curves obtained by plotting threshold cycle (Ct) values against the logarithm of the copy number of the standard samples (determined by dividing the sample DNA weight in nanograms by the published average 1C value for maize genomes; Arumuganathan and Earle 1991). Detailed results of the validation studies as well as the related protocols are available on the CRL-GMFF website at http://gmo-crl.jrc.it/statusofdoss.htm (Mazzara et al. 2005, 2006).

Fuzzy Logic-based Validation

The fuzzy logic-based procedure based on the multivalued fuzzy group introduced by Zadeh (1965) and following the Sugeno method of fuzzy inference (Sugeno 1985) was used to design an expert system for validation of GMO analytical methods. The computational details of fuzzy-based systems for use in validation are supplied with a previous publication (Bellocchi et al. 2002) and are only briefly reported hereafter.

Each metric used in the validation work, according to an expert judgment, has a membership value for two possible classes, i.e., the favorable (F) and the unfavorable (U) classes. Validation metrics such as percent mean bias or percent repeatability standard deviation range from F to U as their values increase. For such metrics, the value of membership to a U class is defined using the following, monotonously increasing, S-shaped curve:

$$S\left( {x;a;b} \right) = \left\{ {\begin{array}{*{20}l}{\begin{array}{*{20}c}0&{x \leqslant a} \\\end{array} } \hfill \\{\begin{array}{*{20}c}{2 \cdot \left( {\frac{{x - a}}{{b - a}}} \right)^2 }&{\quad a \leqslant x \leqslant c} \\\end{array} } \hfill \\{\begin{array}{*{20}c}{1 - 2 \cdot \left( {\frac{{x - b}}{{b - a}}} \right)^2 }&{c \leqslant x \leqslant b} \\\end{array} } \hfill \\{\begin{array}{*{20}c}1&{x \geqslant b} \\\end{array} } \hfill \\\end{array} } \right.$$
(1)

where: x = the value of the basic validation metric, a = the lower threshold (values of x lower than a have membership to the U class equal to 0 and to the F class equal to 1), b = the upper threshold (values of x greater than b have membership to the U class equal to 1 and to the F class equal to 0), and c = (a + b)/2. Its complement, 1 − S(x; a; b), gives the degree of membership of the metric value x to the set F. For a metric like reverse transcriptase (RT)-PCR efficiency, the transition from F to U occurs with decreasing values. Therefore, for this type of metric, Eq. 1 defines a membership to F, while its complement gives membership to U.

The rules for aggregating the metrics in a unique module value are based on two factors: their membership to F and U classes and on expert weights. For each combination of the two input memberships, an expert weight is assigned. Thus, since each metric has a membership to F and U, for two metrics, four expert weights are needed, eight expert weights are needed for the aggregation of three metrics, and so on. As an example, the case of two metrics is outlined to show the relationship between inputs and outputs in linguistic terms by “if–then” statements:

Premise

Conclusion

If x 1 is F and x 2 is F

Then y 1 is w 1

If x 1 is F and x 2 is U

Then y 2 is w 2

If x 1 is U and x 2 is F

Then y 3 is w 3

If x 1 is U and x 2 is U

Then y 4 is w 4

where x i is an input variable, y i is an output variable, and ew i is an expert weight. The value of each conjunction (... and ...), called “truth value” (v i ), is the minimum of the membership to a class (F or U), which are obtained from the S-shaped curves. The two inputs of the above example can be translated into metrics used for method validation, e.g., RSDr and RSDR:

If RSDr is F and RSDR is F

Then p 1 is 0.0

If RSDr is F and RSDR is U

Then p 2 is 0.5

If RSDr is U and RSDR is F

Then p 3 is 0.5

If RSDr is U and RSDR is U

Then p 4 is 0.0

where p i (i = 1, ..., 4) indicates a generic output reflecting method precision.

The application of the rules generates a single fuzzy group that includes several output values (four when combining two inputs) and is defuzzified to resolve a single crisp output value from the group (i.e., a value between 0 and 1). This approach uses the centroid method to obtain the representative nonfuzzy value for the output, as commonly adopted in the Sugeno-type systems (Sugeno 1985). This approach consists in a summation of v i · w i values generated from each combination, divided by the sum of all truth values. The expert reasoning runs as follows: If all input variables are F, the value of the module is 0 (good response from the analytical method according to all metrics used); if all metrics are U, the value of the module is 1 (bad response from the analytical method according to all metrics used), while all the other combinations assume intermediate values. Limits F and U may come from experience, may be extracted from literature, or may be set by law. Under the general criteria that a higher weight is assigned to the preposition that the expert judges far from good performance, the weights can be chosen based on the analyst own experience in handling each validation metric. It derives that weights equal to 0 and 1 are set to prepositions containing only F or U values, respectively; intermediate, expert values are assigned to other combinations.

Results

Fuzzy Logic-based Expert System for GMO Method Validation

Figure 1 illustrates the implementation of fuzzy logic for method validation: First, six metrics computed in GMO methods validation (percent repeatability standard deviation, percent reproducibility standard deviation, percent mean absolute bias, number of reagents required for a RT-PCR analyte quantification, and percent efficiency of RT-PCR for both the GM-specific system and the taxon-specific reference system) are transformed into fuzzy modules (accuracy, practicability, efficiency), and second, the three modules are aggregated into a synthetic indicator. In this context, the term module is used to indicate a validation measure calculated via a fuzzy-based procedure from one or more basic metrics. For each module, a dimensionless value between 0 (best response) and 1 (worst response) is calculated. A two-stage design of a “fuzzy-based rules”-inferring system is applied where firstly metrics with similar characteristics are aggregated into modules, and then, using the same procedure, the modules can be aggregated into a second-level-integrated index (again, ranging from 0 to 1), called indicator. The modules in Fig. 1 cover criteria that the performance assessment of GMO analytical methods should consider (Table 1): (1) the closeness of the analytical response to the true value (expressed in terms of percent bias, B) and the extent to which it varies within a laboratory (percent repeatability standard deviation, RSDr) and across laboratories (percent reproducibility standard deviation, RSDR), (2) the complexity of the method (interpreted by the number of reagents required), and (3) the efficiency of the analytical procedure (RT-PCR) in the amplification rate (expressed by the two efficiency measures).

Fig. 1
figure 1

Structure of the fuzzy-based method validation system, where: RSD r (%) repeatability standard deviation; RSD R (%) reproducibility standard deviation, Bias (%) mean absolute bias, Score number of RT-PCR reagents, E 1 (%) GM test efficiency rate, E 2 (%) reference gene efficiency rate, F favorable threshold, U unfavorable threshold, S S-shaped membership function, x value of metric, a minimum value between F and U, b maximum value between F and U

Table 1 Relative incidence of each basic validation metric on the fuzzy-based indicator

As an example, Fig. 2 shows the S-shaped curves for efficiency rate (E). The S functions are flat at values of 0 and 1 for E ≤ 0.90 (U limit) and E ≥ 0.98 (F limit). An intermediate response is associated with values of the efficiency rate falling within a transition interval in which the membership value for F increases from 0 (at E = 0.90) to 1 (at E = 0.98), and the membership value for U decreases from 1 (at E = 0.90) to 0 (at E = 0.98). Figure 2 shows membership values to F and U for E = 0.91. The weights implemented in the framework of Fig. 1 are only exemplary and refer to the various combinations of F and U (e.g., 0.40 refers to FFU in module linearity). The relative incidence of each validation metric on the indicator can be deduced by combining the weights of the validation metrics into their own module with the ones of the modules into the indicator (Table 1). For instance, relative incidence equal to 0.30 for RSDR reflects the highest impact of this metric on the final indicator. According to the weights set in the fuzzy system, efficiency measures are instead considered as least impacting (relative incidence equal to 0.10). This means that for a method, poor reproducibility easily turns into low performance, while poor efficiency might not be a big issue.

Fig. 2
figure 2

Membership to the fuzzy sets favorable (efficiency) and unfavorable (no efficiency) for a hypothetical method response in terms of efficiency (E); example with the membership values for E = 0.91

Weights and thresholds of the validation metrics indicated in Table 1 were attributed by the authors based on the current understanding of satisfactory method performance characteristics as set forth by the European Network of GMO Laboratories (in particular as for the attribution of the U membership values) and on the expertise of the CRL-GMFF, established by Regulation (EC) no. 1892/2003 of the European Parliament and of the Council as responsible for the validation of methods of detection of GMOs in food and feed (Regulation (EC) no. 1892/2003 2003).

In the fuzzy module “accuracy,” three metrics were considered. For the percent repeatability standard deviation (RSDr), membership values set to 25 (U) and 15 (F) indicate that RSDr values are considered beyond the limit of acceptable performance when they assume figures exceeding 25% and optimal when they are below 15%. The same membership values were attributed to the percent mean absolute bias, while a less strict transition area (20 and 35 for F and U, respectively) was defined for the percent reproducibility standard deviation (RSDR). The three validation metrics were subsequently combined in the module. Their specific contributions were weighed to privilege the relevance of the variability in terms of RSDR and RSDr—respective contributions are 0.50 and 0.35—rather than the bias whose weight was set to 0.15. Given the current state of the art in the GMOs’ testing field where the presence of GMOs in food or feed commodities is best detected and quantified by means of applications stemming from PCR (Holst-Jansen et al. 2003), the “practicability” module was elaborated in such a way to take into account the easiness of the protocol that the operator has to follow to set up a real-time PCR assay. In particular, the number of different components that have to be added separately and in different amounts to a reaction tube may have consequences on the final results due to increased risk of errors, e.g., pipetting. For this reason, a F membership value of 6 and a U membership value of 12 were assigned to the practicability module based on the number of reaction components as indicated in the protocols of methods validated by the CRL-GMFF. The module “efficiency” was given by the combination of the efficiencies of the GM- and of the taxon-specific reference system whose U membership value was set to 90% and F membership to 98%. The efficiency of the two systems was determined by the slope of the respective standard calibration curves using the formula: [(10−1/slope) − 1] × 100. The two values of efficiency contribute to the same extent to the overall indicator. Finally, the three modules “accuracy,” “practicability,” and “efficiency” were aggregated in the synthetic indicator where their respective contribution was set at 0.60, 0.20, and 0.20.

Functioning of the fuzzy-based expert system for GMO method validation

For the purpose of illustrating the functioning of the fuzzy-based expert system, basic metrics, modules, and the final indicator were computed over the validation analysis of methods of detection of GM maize lines TC1507 and 59122 (Table 2). These values are used here as exemplary of the type of outputs gained from fuzzy-supported validation analysis, out of an interlaboratory study. This example shows that method TC1507 is globally well performing according to the overall indicator for all analyte levels (in average equal to 0.1305). However, a good method does not have to show necessarily a perfect accuracy at all GM levels. For example, the “accuracy” module is equal to 0.00961 at the 0.1% and 0.0000 at 0.5%, 0.9%, and 2.0% GM levels. The relative strength of method TC1507 also resides in the technical modules (efficiency and practicability), whose values approach the optimum thresholds. The method 59122 (whose average indicator is equal to 0.4396), is characterized by optimal values for the “accuracy” module at the highest GM levels namely, 2.0% and 4.5%, although it is less performing at the lowest level of the analyte (module equal to 0.4580 at the 0.1% level), along with general poor values of practicability (module equal to 1.0000) and efficiency (module equal to 0.8400).

Table 2 Validation response of two GMO analytical methods: basic metrics, modules, and overall indicator

These results are the consequence of the choices made regarding the selection of input measures, the definition of transition intervals, and the values given to the conclusions of the decision rules. As such, the output of the fuzzy-supported exercise presented in this article has by no means to be intended as a comparative analysis of the basic metrics, modules, or overall indicators of two validated methods for the detection of two different GMOs, already declared fit for the purpose of regulatory compliance. Rather, this is the first attempt to describe, apply, and show how the fuzzy logic principle may be used to combine several performance statistics of method validation in a synthetic numerical indicator. These illustrative results should be substantiated by finer-tuned definition of weights and transition intervals as more data will be processed. A sensitivity analysis conducted over a broad range of weights and transition intervals can help identifying how the variability of such subjective elements affects the indicator, but this step is considered going beyond the scope of this paper.

Discussion

In the design of a process to assess the performance of an analytical method, two major questions should be addressed: (1) which validation metrics should be taken into account and (2) how the information from different validation metrics should be combined and interpreted. Provided that an answer to the first question is given by the analysts according to recognized guidelines, the fuzzy-based inferential system illustrated here proposes an answer to the second question because it contains two key elements: the use of a fuzzy set and the use of decision rules. The use of a fuzzy group provides a well-designed solution to the problem of deciding cutoff values for basic validation metrics, e.g., the limit between F response, U response, and transition response. The use of decision rules provides a rational to aggregate validation metrics into modules. The combination of these two concepts (limits in the response, mode of aggregation) in groups of fuzzy-based rules is attractive because, although the combinations of validation metrics are infinite, a single set of fuzzy rules connect them all.

The system proposed is based on a compromise between operational needs (i.e., validation of analytical methods) and flexibility (hierarchization of objectives and aggregation of statistics). The objective of an expert system is the simulation of human expertise. The expert system is substantiated if it displays, under a variety of conditions, the same responses that the human expertise would provide (Plant and Stone 1991). Experts are therefore invited to comment on the setup and results of the fuzzy inferential system. If there is a disagreement between expert perception of method performance and the input of the expert system, the cause of this divergence will be examined in view of: (1) choice of basic validation metrics, (2) choice of limits on the transition interval, (3) formulation of the decision rules, and (4) formulation of the mode of aggregation of the modules. All of these points may be modified according to expert consensus, after an extensive testing of the methodology.

The fuzzy expert system proposed is able to reflect the analyst expert judgment about the quality of the method performance. Such an approach provides a comprehensive assessment, making it easier to identify the best performing methods. It is also computationally efficient and well suited for mathematical analysis. Its utility was already proven in the validation of agro-meteorological and soil models (Donatelli et al. 2004, Rivington et al. 2005), where it allowed gaining a fine level of details about model quality and behavioral characteristics, but to the best of the authors’ knowledge, the same approach has not been applied so far to validate analytical methods. The same approach can find applicability within analytical method validation strategies, provided that a wide consensus about weights and limits is achieved. Its application in analytical method validation can be valuable for a number of reasons:

  1. 1.

    It allows the analysts to express mathematically individual or collective preferences (uncertainty factors).

  2. 2.

    It highlights the degree of method failure/goodness associated with each information (basic validation metrics and aggregated modules).

  3. 3.

    It elucidates the degree of reliability of response associated with alternative methods.

  4. 4.

    It puts various components of the validation process (metrics, modules) in a formal structure.

  5. 5.

    It summarizes several sources and levels of information into a single value.

  6. 6.

    It allows the examination of possible compensations among different metrics and modules.

The modular structure of the fuzzy-based expert system also presents other advantages. First, users have access to both a synthetic indicator, reflecting overall judgment, and to individual modules. This allows complete transparency of each step, and consequently, it allows a better control of the process itself. Second, the mode of aggregation of modules can be changed, and new modules can be possibly added. The multivalue nature of a validation process is explicitly stated, the rules are easy to read, and the numerical scores used for their conclusion are easy to tune to match expert opinions. The method illustrated is flexible and can be extended to aggregate more validation metrics of different types. However, three groups of metrics and three metrics per group appear from experience as valuable upper limits to facilitate interpretation of results.

The freeware Analytical Method Performance Evaluation (Acutis et al. 2007) supplies provisions for fuzzy-based aggregation. It allows the creation of reusable modules and indicators for use in method validation, thus serving as a convenient means to support collaborative work among networks of scientists involved in fuzzy-based method validation.