Introduction

The choice of the most appropriate method for quantifying plant disease depends on various factors including the objective of the work, availability of equipment and feasibility, spatial scale (organ, plant, plot or field), as well as time and cost constraints (Madden et al. 2007). Measures of disease intensity include prevalence, incidence and severity. Plant disease severity, or the degree to which a specimen (plant or plant part) is diseased (Bock et al. 2021a), can be expressed by various means. A commonly used metric of disease severity is the percentage area of the considered plant tissue affected by the presence of disease symptoms on the host or signs of the pathogen (Nutter et al. 1991). Disease severity is particularly valuable to (1) quantify and understand plant disease epidemiology, (2) quantify yield losses, (3) evaluate control methods, and (4) screen plant genotypes for host resistance (Bock et al. 2016).

Disease severity based on the percentage of plant tissue area visibly diseased is most commonly estimated via visual assessment using the human eye as a remote sensor (Bock et al. 2010, 2016). Despite the recent advances in technology and availability of tools based on remote sensing and image analysis (e.g., image segmentation by color thresholding), for measuring disease severity, visual assessment prevails, especially in field research (Bock et al. 2020, 2021b). Visually, a rater perceives and assesses the percentage of diseased area (via estimation) or, in some cases, depending on the type of scale used, applies a class grade based on an ordinal scale that encompasses a distinct pattern or a percentage interval (Bock et al. 2016; Chiang et al. 2014).

Estimation of percentage areas is deemed more challenging than classification of severity into ordinal classes, considering the large number of options to choose from on the percentage scale, compared to the finite and small number of classes in ordinal scales (Large 1966; James 1974). Moreover, estimation is subjective and known to vary considerably among raters (Nutter 1993; Bock et al. 2009). Specific tools have been proposed to train raters prior to and to aid them during the assessment process so that the visual estimates are as accurate as possible. Accuracy of a severity estimate is the closeness to the actual value; therefore a qualitative concept that can only be assessed in relation to what are accepted as the “actual,” “true,” or “gold standard” values (Bock et al. 2016). Consistently accurate estimates of severity will be defined as “precise” (statistically less variable), but reliable or precise estimates are not necessarily accurate (Madden et al. 2007). Hence, precision is a second important component of overall accuracy. The definition and importance of accuracy in visual estimates have been recently reviewed and discussed (Bock et al. 2016).

Standard area diagram sets (SADs), or a similar pictorial or graphic representation of selected disease severities on a plant organ, are among the aids used to calibrate a raters’ eyes for performing a visual assessment (Del Ponte et al. 2017). SADs were first developed over a century ago but have received considerable attention during the last quarter-century during the modern era of plant pathology research (Del Ponte et al. 2017; Bock et al. 2020). Indeed, from a handful of black and white hand drawings of diseased specimens developed at the end of the nineteenth century (Cobb 1892), recent research has proposed more elaborate SADs that have been validated—i.e., demonstrated to increase accuracy and reliability of visual estimates compared with unaided estimates (Del Ponte et al. 2017). In the last 20 years, plant pathologists have taken advantage of advances in image processing and analysis tools and from knowledge gained from the psychophysical and measurement sciences to develop SAD sets that are realistic (e.g. true color photographs), with appropriate validation and illustrated severities to maximize estimation accuracy (Domiciano et al. 2014; Schwanck and Del Ponte 2014; Araújo et al. 2019; Franceschi et al. 2020).

To date, research in the field has focused largely on the proposition and validation of new SADs for diseases that lack the tool and much less on improving upon existing ones (Domiciano et al. 2014; Schwanck and Del Ponte 2014; Araújo et al. 2019) or exploring the effects of SAD-specific factors (e.g., diagram appearance and number, scale structure, instructions) or other aspects that may affect rater performance including symptomatic patterns (Domiciano et al. 2014; Schwanck and Del Ponte 2014; Bock et al. 2015; de Melo et al. 2020). A list of future research needs and best operating practices for developing and using SADs has been prepared to guide researchers in the design and evaluation of more effective systems that will ultimately lead to improved accuracy (Del Ponte et al., 2017; Bock et al. 2020). Since our previous review published on the topic (Del Ponte et al. 2017), 50 further studies have been published. A qualitative evaluation suggests the use of SADs results in less bias and consequently greater precision. But the degree of improvements in the accuracy and precision of estimates, the circumstances of rating, and characteristics of symptoms (size, number, distribution, and maximum severity) under which SADs have proven most valuable has not been characterized in a systematic manner. However, some early research indicated effects of lesion size, number, and distribution (Amanat 1976; Kranz et al. 1977; Sherwood 1983; Forbes and Jeger 1987; Hock et al. 1992).

A common statistical method to investigate the overall accuracy of visual estimates of severity is Lin’s concordance correlation coefficient (LCCC) (Lin 1989), which is the product of precision (as Pearson's correlation coefficient, r) and generalized bias (the product of scale-shift [systematic bias] and location-shift [constant bias] measures) (Nita et al. 2003; Madden et al. 2007). Prior to LCCC's first use in SAD research (Spolti et al. 2011; Yadav et al. 2013), linear regression models were the statistical standard used to relate rater estimates to the actual severity values (see reviews by Del Ponte et al. 2017; Bock et al. 2020). Indeed, linear regression continues to be used for evaluating SADs (Santos et al. 2017; Camara et al. 2018; Trojan and Pria 2018; Arias et al. 2020; Robaina et al. 2020; Kublik et al. 2020). The two linear regression coefficients (intercept and slope), together with the coefficient of determination (R2), are usually reported for each rater in the SAD validation study, while those researchers reporting LCCC statistics have most often reported means of all raters combined (Del Ponte et al. 2017; Bock et al. 2020; Brás et al. 2020; Nascimento et al. 2020; Rivera et al. 2020; Castellar et al. 2021; Montero et al. 2021).

When using linear regression, the constant (intercept, a) and the first regression coefficient (slope, b) represent two measures of bias, constant bias and systematic bias, respectively. The closer a is to zero and b is to one, the smaller the bias, and thus the closer the visual estimates of percentage area are to the actual values (Teng 1981). The two coefficients considered separately do not inform us how close the estimates are to the actual severity values. The overall variability in estimates in relation to the actual values is provided by the coefficient of determination (R2). By taking the root-square, we obtain Pearson’s correlation coefficient, r, which is a measure of precision and a component of the LCCC (Madden et al. 2007). In contrast to LCCC, which provides a global measure of concordance (or overall accuracy), precision and the two measures of bias in linear regression are evaluated independently.

The current study is a sequel to our previous systematic review of those SADs that had been published in peer-reviewed journals up to 2017 (Del Ponte et al. 2017). We have since updated the database to include all additional published articles since 2017. Firstly, we considered all peer-reviewed SAD studies to systematically select studies and conduct a meta-analysis to obtain mean estimates of each of the three linear regression coefficients indicative of the two components of bias and precision for the visual estimates that were published in the original articles. Secondly, we explored factors related to disease symptom characteristics and SADs that could explain the variability in precision and bias of the unaided estimates as well as the gains in precision and reduction of bias resulting from use of SAD aids.

Material and methods

Data sources and variables used

In addition to the variables listed in our previous systematic review of published SADs (Del Ponte et al. 2017), we have included additional numeric variables representing the bias and precision of the visual estimates. Data on the regression statistics (intercept, slope and coefficient of determination) associated with the estimates of severity were obtained from each of the peer-reviewed studies in which SADs had been validated in a systematic manner, and for which the data existed for each rater in the study. Using the same bibliographic search criteria described in our previous review on the topic (Del Ponte et al. 2017), we were able to locate, catalogue, and extract information from an additional 48 articles published since 2017, which resulted 153 (105 + 48) articles that validated SADs published between 1990 and 2021, regardless of the disease or host plant species. Finally, we classified each SAD in the 153 studies according to a nominal scale of symptomatic patterns created specifically for this study (details provided in a later section).

Criteria for study and rater selection

We systematically selected the studies for analysis following objective criteria (Fig. 1). We first queried our database to select studies where summary statistics (intercept, slope and coefficient of determination) of severity estimates were available by individual rater and obtained both the data for the unaided and aided estimates using SADs. Nine studies were not included due to a lack of validation, and 19 additional studies were not included due to lack of unaided assessments. We excluded a further 32 studies for which the linear regression statistics were not available by individual rater. A further filter screening rater number was applied to ensure a minimum of six raters participated in the SAD validation. Based on the final screen, a further 15 studies were excluded. Thus, 78 studies remained, in which there were 923 raters who assessed severity on a range of diseases without and with SADs.

Fig. 1
figure 1

Flow diagram describing the steps and criteria for systematically reviewing studies on standard area diagrams (SADs) for inclusion in a meta-analysis of the bias and precision of visual estimates of plant disease severity using SADs

A preliminary, exploratory analysis of all regression statistics showed outliers, defined as raters with exceptionally high or low intercepts and slopes, relative to zero and one, respectively, or with very low Pearson’s correlation coefficients, r (precision). We excluded raters, regardless of study, who exhibited an r < 0.3; an intercept value > 30 or < 30, and a slope > 5. A total of 75 raters were thus removed, reducing the total number of raters to 848 (8.1% were outliers). Finally, we eliminated a further six studies due to the number of raters being less than six. The final number of studies was 72 and included 823 raters (Fig. 1).

Lesion patterns

Each selected disease represented by a SAD was assigned to a nominal variable based on a descriptive pictorial key for classifying symptomatic patterns developed specifically for this study. The pictorial key comprising a 4-category scale was constructed based on characteristics of the lesions including number and size (single and large, few and medium sized, and numerous and medium to small in size), and the presence or absence of coalescence of the lesions as the disease progressed in the illustrated SADs. The pictorial key (Fig. 2) was used by five independent evaluators to classify each SAD accordingly. The five evaluators are experienced in disease assessment (the authors of this study). The most voted nominal variable applied to a SAD was the one assigned. Maximum severity represented in the SADs for each study was included in the analysis.

Fig. 2
figure 2

Pictorial key used for the classification of symptoms based on the number, size, and coalescence of lesions

Exploratory analysis

First-order linear regression analysis was used in the original articles to make inferences regarding bias and precision of the visual estimates. We graphically explored the distribution of the intercepts and slopes and that of the correlation coefficient (calculated from the coefficient of determination) from the assessments without and with SADs. Subsequently, we averaged each regression statistic across raters and calculated the gain or loss by subtracting the mean value of the unaided estimates from the mean value for the SAD-aided estimates. For example, a positive or a negative difference in the Pearson’s r represents a gain or loss in precision, respectively. The data for each statistic were summarized as means and standard deviations (across raters) both unconditioned (overall means) and conditioned to each study and were explored graphically.

Meta-analysis of regression coefficients

We fitted an arm-based meta-analysis, also called a two-way unconditional linear mixed model (Madden et al. 2016), directly to the means for each of the three statistics. The model can be written as:

$${\mathrm{Y}}_{i}\sim N\left(\mu ,\sum +{S}_{i}\right),$$
(1)

where Yi is the vector of the means of the regression statistic (Pearson’s r, intercept, or slope) for the two treatments for the ith study, µ is a vector representing the mean of Yi across all studies, Σ is a 2 × 2 between-study variance–covariance matrix, and Si is the within-study variance–covariance matrix for the ith study. N indicates a multivariate normal distribution. The method (unaided or SAD-aided) was treated as a fixed effect and both the study and method as random effects, the latter to account for the dependency (as the same raters evaluated both without and with the aid). An unstructured Σ matrix was used, and the models were fitted to the data with a maximum-likelihood parameter. We used the inverse of the sampling variance as a weighting variable. This was calculated as the variance of each statistics’ value across the raters divided by the number of raters for a specific study. Hence, studies with lower variance (indicating the method was more reproducible among raters) contributed more weight to the meta-analytic estimate. All models were fitted to the data using the rma.mv function of the metafor package (Viechtbauer 2010) of R (R Core Team 2021).

The model was expanded to include categorical moderator variables that could explain, at least in part, the heterogeneity of the means across studies (Madden et al. 2016). The categorical moderator variables included symptom number-size classification (F: few; Nc: numerous coalescent and Nnc: numerous non coalescent); maximum severity in the SAD (< = 50% or > 50%); and organ type (leaf or other). Linear contrasts were used to estimate the mean effect sizes and their standard errors and 95% CIs for each level of the categorical moderator variable (Madden et al. 2016). The gains in precision and accuracy were given by the difference between the estimates using the SAD and those obtained unaided.

Multiple correspondence analysis of the unaided estimates

In addition to the calculation of the meta-analytic estimate for each parameter, we created categorical variables for each regression parameter obtained from the unaided estimates. These were two nominal variables for the intercept as low (≤ 3.4) or high constant bias (> 3.4); positive (> 1) or negative (< 1) systematic bias; and three classes of precision: low (≤ 0.84), moderate (0.84 to 0.93), and high (> 0.93). A multiple correspondence analysis (MCA) was performed, given the nature of multiple variables was nominal, to detect and represent underlying structure in the database (Hjellbrekke 2018). The same categorical variables used for the meta-analysis were implemented in this analysis.

Results

What is the magnitude of the benefit to using SADs?

In general, the Pearson’s r values, indicative of precision, were closer to 1 when using SADs (Figs. 3a, b), but the gains (increase in precision) from using the aid varied considerably among the studies, ranging from negative (a loss in precision) in one study to an increase of > 0.2 in five of the studies. The variation in the intercepts (constant bias) was greater when not using the aid compared with the aided estimates and was mostly positive, indicating an overall overestimation of severity (Figs. 3d, e). The intercept values were generally reduced (closer to zero) when using the SAD aid (Fig. 3f). The range of the slope (systematic bias) values was slightly reduced overall but was also increased (greater than 1) in some studies when using the SADs compared with the unaided estimates (Figs. 3g, h, i).

Fig. 3
figure 3

Density (A, D, G) and dot (B, E, H) plots for the distribution of values of Pearson’s r (A, B, C), indicative of precision of visual estimates of severity, and intercepts (a) (D, E, F) and slopes (b) (G, H, I) as indicative of bias, both unaided (UN) or aided by a standard area diagram (SAD) (n = 72 studies). Plots C, F, and I display the cumulative distribution of the gain (loss) in precision and reduction (increment) in constant (a, intercept) and systematic bias (b, slope). Each dot represents the means across raters for each published study; error bars indicate the standard deviations of the means

Results of the meta-analysis showed that the intercept was the most affected parameter when using the SADs (Fig. 4). Globally, there was a 2.65 reduction in the intercept, from 3.41 (95% confidence interval (CI) 2.78–4.04) to 0.76 (95% CI 0.45–1.03). However, the 95% CI of the estimate did not span zero, suggesting an overall tendency of raters to overestimate severity even when using a SAD (Fig. 4e). The slope was the least affected statistic, slightly reduced from 1.09 (95% CI 1.008–1.183) to 0.966 (95% CI 0.914–1.018). The 95% CI of the meta-analytic estimates of the slopes without the SAD did not embrace zero suggesting systematic bias was alleviated when using the SAD. Finally, the meta-analytic estimate of precision was significantly increased when using the SAD compared with the unaided estimates. Precision increased, on average, by 0.071, from 0.871 (95% CI 0.849–0.893) to 0.943 (95% CI 0.933–0.953) (Fig. 4c).

Fig. 4
figure 4

Overall estimates using a meta-analytic model (n = 72 studies) for the statistics of linear regression analysis for making inferences regarding accuracy (intercept and slope) (A, B) and precision (Pearson's r) (C) of visual estimates of severity either unaided (UN) or aided by a standard area diagram (SAD). The predicted line (solid line) and respective 95% confidence interval (CI, dashed lines) using the estimates for the statistics are presented in D and E. Error bars indicate the 95% CI of the estimate

Do symptomatic patterns affect bias and precision of unaided estimates?

The three symptom characteristics of the diseases represented in the SADs (size, number and maximum severity) influenced the precision and accuracy of the unaided estimates. Lesion number, coalescence, and maximum severity (Fig. 5) significantly affected the rater statistics. The greater the number of lesions that did not coalesce and were smaller in size, (Figs. 5A-C), combined with a low maximum severity (Fig. 5D,E,F), resulted in lower precision and greater bias of the estimates. The precision of the estimates for specimens with few lesions (< 20 lesions) was most often greater than the meta-analytic estimate but was more variable for the symptoms characterized by numerous coalescent lesions (Fig. 5A).

Fig. 5
figure 5

Half-eye density and dots plots for linear regression statistics indicative of bias (slope and intercept) and precision (Pearson’s r) for visual estimates of percentage area affected by disease (severity) obtained unaided (no standard area diagram set, SADs) and conditioned to a classification of symptoms according to size and coalescence (A, B, C), maximum severity represented in the disease SAD (D, E, F). Each grey dot represents a published research study on SADs. The black dot in the middle represents the mean and the solid vertical lines the intervals of a frequentist distribution

There was a tendency for the unaided estimates to be more precise on plant organs other than leaves (Other), which may be related to the characteristics of the lesion which were most often larger and fewer than on leaves (Leaf) in general (Fig. 5G, H, I; Fig. 6).

Fig. 6
figure 6

Multiple correspondence analysis map for the association among classes of linear regression statistics (a = intercept: smaller or larger than 3.4; b = slope: small or larger than 1; and r = Pearson's r: smaller than 0.84, between 0.85 and 0.92 or greater than 0.92) obtained for the relationship between unaided estimates of severity and actual severity summarized across the 72 published studies; and the maximum severity represented in the standard area diagrams (SADs) that was divided into two classes (those small or equal to 50 or greater 50%). The SADs were developed for a specific plant disease and organ type (leaf or other)

The MCA map provides a perspective for the relationships between disease characteristics and organ-related factors and the magnitudes of Pearson’s r and the slope and intercept statistics (Fig. 6). A Pearson’s r > 0.92 clustered with other organs and a maximum severity > 50%, suggesting an association between these variables (Fig. 6 lower left). Lower precision (< 0.84) was more strongly associated with numerous non-coalescent lesions and a maximum severity ≤ 50% (Fig. 6 upper right). Intercepts > 3.4 were associated with numerous coalescent lesions (Fig. 6 lower right). Finally, lower bias was associated with fewer lesions (Fig. 6 upper left).

What factors influence bias and precision when using SADs?

We explored the effect of the two disease-specific factors that had the greatest effect on the unaided estimates (lesion number/symptomatic pattern and maximum severity) on estimation bias and precision using SADs. As expected, the gains in precision and reductions in bias were generally greater for diseases that are characterized by numerous lesions (> 20 small lesions), especially if they do not coalesce and rarely reach 50% severity (Fig. 7). It was not clear whether the true-color image SADs or those that had more illustrations of diseased leaves (or other organ type) increased precision and reduced bias of the estimates given the substantial variability across the studies within each defined group (Fig. 8). Finally, there was an inverse relationship between the unaided estimates with the gain/loss in precision and reduction in the constant and systematic biases (Fig. 9). The more biased and imprecise the unaided estimates, the greater the benefits to using SADs, especially for situations where the lesion classification was numerous compared to those classified as few. In only a single study was the precision lower when using SADs compared to when estimates were made unaided (Fig. 9a).

Fig. 7
figure 7

Half-eye density and dots plots for the linear regression statistics indicative of bias (slope and intercept) and precision (Pearson’s r) for visual estimates of percent area diseased (severity) made with the aid of standard area diagram set (SADs) and conditioned to a classification of symptoms according to size and coalescence (F = few; Nc = numerous coalescent and Nnc = numerous non coalescent) (A, B, C) or maximum severity represented in the SAD (greater or equal to 50% or greater than 50%) (D, E, F). Each grey dot represents a published research study on SADs. The black dot in the middle represents the mean and the solid vertical lines the intervals of a frequentist distribution

Fig. 8
figure 8

Half-eye density and dot plots for linear regression statistics indicative of bias (slope and intercept) and precision (Pearson’s r) for visual estimates of percent area diseased (severity) made with the aid of standard area diagram sets (SADs) and conditioned to two classes of image type (drawing or photo) (A, B, C) and two classes of number of illustrations in the SADs (one to seven diagrams or eight to 12 diagrams) (D, E, F). Each grey dot represents a published research study on SADs. The black dot in the middle represents the mean and the solid vertical lines the intervals of the distribution

Fig. 9
figure 9

The gain or loss in linear regression statistics indicative of precision (Pearson's r) (A), bias (slope and intercept) (B, C) for visual estimates of percent area diseased (severity) made first without, and then with the aid of standard area diagram set (SADs) (n = 72 studies). The dashed line in A is indicative of the overall means of gain in precision characteristics of lesions are indicated by color (see legend, where F = few, Nc = numerous and coalescent, and Nnc = numerous and non-coalescent). Each dot represents a published research study on SADs

Discussion

Our systematic review builds on our previous research (Del Ponte et al. 2017) and has identified 48 new studies in which SADs were developed and evaluated with the aim of increasing accuracy and precision of visual estimates of disease severity. On average, 10 studies have been published each year since 2017, and the total number of publications on SADs catalogued in our database (as of July 2021) is 153 (a searchable interface for SADs articles is publicly available at https://sadbank.netlify.app/). The results confirm that the use of SADs most often results in improved accuracy and precision of visual estimates. Our analysis contributes to an improved understanding of SADs through the consideration of factors related to SAD design and structure, disease symptoms, and actual severity. Combined with other knowledge on SADs, the results should help increase the accuracy and precision of visual estimates (Franceschi et al. 2020; Belan et al. 2020; de Melo et al. 2020; Pereira et al. 2020).

Through meta-analysis, we quantified the overall benefits of SADs and also identified factors that could explain, at least in part, the variability of the summary statistics from linear regression (intercept, slope and the coefficient of determination) that are indicative of bias and precision. Typically, meta-analysis requires the application of criteria for search, selection, inclusion, and further exclusion of studies based on the required criteria for the analysis. In our case, we focused on the summary statistics of the linear regression coefficients and did not include studies that used LCCC (with the exception of our own data, for which linear regression statistics could be calculated). LCCC is actually considered a more suitable method for the purpose of gauging accuracy and precision (Madden et al. 2007). However, our objective was to select studies where all statistics of bias and precision were reported for individual raters (available for at least six raters), allowing us to calculate the sample variance. This is usually not available when authors use LCCC because the means across raters tends to be reported for statistical comparison between methods (Yadav et al. 2013; Rivera et al. 2020; Castellar et al. 2021). Another reason is that the values of the slopes and intercepts, which are measures of constant and systematic bias, respectively, cannot be directly compared with the location-shift and scale-shift statistics of the LCCC, respectively. Future meta-analytic studies could focus on the use of LCCC statistics to confirm our results. We note that more than 60% of the studies published following our 2017 review paper used LCCC (data not shown) to determine agreement with actual values. Further comprehensive analysis could be performed if the raw data (estimates at the rater levels, not only the summary statistics) was shared in the original SAD studies. This is one of the best operating practices when conducting SAD research (Del Ponte et al. 2017) but is still not practiced by many plant pathologists reporting SAD research. We encourage authors of SAD studies to provide the original rater data for unaided and SAD aided estimates in support of establishing an open science culture in the field (Sparks et al. 2021) and because those data can be valuable in future analyses that further our knowledge and understanding. We support the use of Lin’s CCC as a substitute for, or in addition to, linear regression.

Pearson’s r, a component of LCCC that is indicative of precision, was calculated from the coefficient of determination of the linear regression. The overall increase in precision of the estimates was 0.071 (from 0.87 unaided to 0.943 when using a SAD), but considerable variability was observed across the studies. Similarly, constant bias (intercept), estimated at a value much larger than 1 for unaided estimates, was greatly reduced when using the SADs, suggesting that overestimation was reduced considerably at all severities when using the aid. The systematic bias (slope) was also reduced (closer to 1), suggesting that the overestimation proportional to the magnitude of the actual severity was greater when not using the SAD aid.

The gains when using the SADs are influenced by how well raters are able to accurately estimate disease severity unaided (a rater's baseline accuracy), a phenomenon that has been observed for individual raters (Yadav et al. 2013; Braido et al. 2014, 2015; Bock et al. 2015). However, we found the improvement in accuracy varied according to symptom characteristics and maximum severity in the SAD set. To better understand the effect of symptoms, we developed a new key used to classify the SADs into four groups according to symptom number, size and degree of lesion coalescence. Using this approach, we found that SADs were of greatest value when the lesions are numerous, small and do not coalesce, and for those diseases (based on the maximum SAD value) that rarely exceed 50% severity. The characteristics were demonstrated to have strong association with each other based on the MCA visualization. Conversely, the smallest gains were in cases where unaided estimates were already accurate and were observed where there were few large lesions or where small lesions grew in size or coalesced, resulting in more than 50% of the specimen surface area being diseased, corroborating earlier reports of the effect of lesion size and number (Amanat 1976; Sherwood 1983; Forbes and Jeger 1987). The illusions discussed by Sherwood et al. (1983) that influenced visual judgement and resulted in greater overestimation obtained using SADs were particularly profound at low disease severities. Our results corroborate this finding; a significant departure from zero was estimated for the overall intercept, an effect that was greatest for the numerous and noncoalescent lesion category. Lesion coalescence may give the impression of larger single lesions that are easier for raters to accurately estimate disease severity. Interestingly, we found maximum severity influenced accuracy of the unaided estimates, but this was closely associated with lesion characteristics.

We were not able to detect a significant effect of the number of illustrations in a SAD set affecting the precision and bias of rater estimates, which may be related to maximum severity. We found that larger gains in precision occurred when maximum severity was less than 50%, a range where the number of diagrams tends to be fewer than with severities greater than 50%. When using maximum severity divided by the number of diagrams (data not shown), the effect was more pronounced, as this takes into account the effect of having a smaller severity range if maximum disease is low, which was the case in several of the SAD sets. Dividing the maximum severity by the number of illustrations in the SADs avoids the possibility of a false effect of fewer SADs where maximum severity is low—a disease with a low maximum severity with the same number of illustrations as a disease that attains 100% may provide more guidance for estimate interpolation. In this regard, the small effect we noted corroborates the results of three other studies where the number of illustrations in a SAD set was compared, and there was little effect of the number of illustrations in the SAD sets studied (Braido et al. 2014, 2015; Bock et al. 2015). Our results and those of the earlier studies are indicative that more research is needed to specifically investigate the effects of the number of illustrations in a SAD. Interestingly, newer updated and refined SADs with greater numbers of illustrations have resulted in more accurate data (Franceschi et al. 2020). The Franceschi et al. (2020) result indicates a compelling reason to revisit some of the early-developed SADs and determine whether a new set should be developed.

We were unable to determine if there was an effect of image type (photo or drawing of diseased specimens) in this study. This may be due to any effect being particularly subtle. Schwanck and Del Ponte (2014) noted an effect of SADs color on both measures of bias, which were significantly improved using black and white images, although numerically the effect was slight. However, the greater accuracy achieved from new SAD sets (Franceschi et al. 2020) may in part be due to the image type used, although several other factors we have noted above could be involved too. There will inevitably be some factors that will have major effects on accuracy (including lesions numbers/size), and rater effects, but many other factors may play minor roles, although collectively we should strive to optimize these as well. Combined together these factors can offer further improvements to the accuracy of SAD-aided visual assessments. However, even with SADs, we are limited by the innate ability of raters to accurately estimate a value, and improvements in accuracy will have diminishing returns as the major sources of error are understood and resolved through optimized SAD design, appropriate training, and instruction.

In conclusion, we applied a meta-analysis to disease severity assessment data to explore how characteristics of the pathosystem and SADs affect precision and constant and systematic biases. Based on the results, we affirm the value of SADs for reducing bias and increasing precision of visual assessments of disease severity and also identify some of the characteristics of disease symptoms where the tool has greater or lesser value as an assessment aid. Specifically, we determined that the number of illustrations in SADs should take the maximum severity into account and more biased and less precise severity estimates are associated with diseases characterized by small and numerous lesions.