Introduction

In Brazil, bacterial leaf streak caused by Xanthomonas vasicola pv. vasculorum Cobb was initially observed in corn-producing regions of western Paraná, southern Brazil, in 2018 (Leite Junior et al. 2018; Leite et al. 2019). At the same time, the number of outbreaks of the disease at epidemic levels has increased in major cereal-producing countries such as the USA (Lang et al. 2017) and Argentina (Broders 2017; Plazas et al. 2018). The increase in occurrences represents a potential threat to the global maize crop.

The initial symptoms of the disease consist of small circular spots (2–3 mm) that develop into elongated and narrow lesions surrounded by a yellowish-colored halo restricted to the internerval regions of the leaves. With the development of the disease, the lesions coalesce and can reach large areas of the leaves and bracts of the corn plants (Leite Junior et al. 2018). In some cases, the symptoms are similar to gray leaf spot of corn, which is caused by Cercospora spp. The symptoms may be differentiated by the shape of the lesion edges, which are wavy for the bacterial disease.

Since the confirmation of the disease in the state of Paraná, new cases have been reported in different corn-producing regions of the state. Bacterial dissemination over short distances can occur by wind, rainwater splashing and irrigation (Leite Junior et al. 2018), and dissemination over long distances can occur through infected seeds (Arias et al. 2018) and probably through residues on agricultural equipment (Leite Junior et al. 2018).

Little is known about the effects of management strategies on disease control. Disease quantification is fundamental to exploring the effects of management control strategies and epidemiological studies. Accurate and precise disease estimation is important for estimating yield losses as well as breeding for resistant cultivars (Nita et al. 2003; Nutter and Esker 2006). To obtain a reliable result, the development of standardized methods for quantifying the severity of field disease is required (Amorim and Bergamin Filho 2018). In this context, diagrammatic scales have been an appropriate tool, mainly because they reduce the subjectivity of severity evaluations and improve the accuracy of estimates (Chiang et al. 2016, 2017a, b).

The use of scales aims to increase accuracy and repeatability among evaluators; that is, different evaluators should assign severity values to the same leaf that are close to the actual value and close to the values assigned by other evaluators. Bock et al. (2010) noted that inaccurate and/or imprecise disease severity evaluation might lead to faulty conclusions and, consequently, to incorrect disease management decisions. Precision and accuracy are commonly assessed using the coefficient of determination, the angular coefficient, and the linear regression intercept (Capucho et al. 2010; Godoy et al. 2006). To assess the level of agreement among the evaluators, Lin’s coefficient of agreement can be used (Bardsley and Ngugi 2013; Dolinski et al. 2017; Fantin et al. 2018; Nita et al. 2003).

Thus, the present study aimed to: (i) develop and validate a diagrammatic scale for the evaluation of the severity of bacterial leaf streak by adopting two methods: linear regression and Lin’s statistics, and (ii) assess inter- and intrarater reliability and accuracy of experienced and inexperienced evaluators.

Material and methods

Sample acquisition

To develop the diagrammatic scale, 450 adult leaves of maize that were naturally infected by Xanthomonas vasicola pv. vasculorum with leaf streak symptoms were collected between stages V12 to R2 in the field located in the municipality of Cafelândia, Paraná state, Brazil, 2018/2019 season.

Image processing

The corn leaves were photographed individually with an image resolution of 300 dpi. The disease severity was processed using Assess 2.0® image analysis software (American Phytopathological Society, St. Paul, MN, USA). The necrotic lesions and the translucent yellow halo were considered the diseased area. The percentage of leaf area affected by the disease obtained through the software was considered the actual disease severity.

Diagrammatic scale development

The maximum and minimum levels found in the field were considered, and the intermediate levels of the diagrammatic scale were developed following the visual acuity law of Webber-Fechner (Horsfall and Cowling 1978). Eight levels of disease severity were established in the diagrammatic scale. The scale was developed using Adobe Photoshop® software.

Validation

For the proposed scale validation, visual estimates of the severity of leaves were performed by ten evaluators. The group was composed of five evaluators with disease evaluation experience (EX) (I, II, III, IV and V) and five evaluators without experience (IN) (VI, VII, VIII, IX and X). Before the test, the evaluators were instructed how to perform the assessment. A total of 50 images of maize leaves with bacterial leaf streak symptoms were uploaded to a Microsoft PowerPoint presentation. One leaf image was placed on each slide. The assessment test was performed in two steps. First, the evaluators estimated the severity level without the proposed scale. One day later, the test was performed again with the proposed scale. To validate the benefits of using the proposed scale, the data were analyzed by two methods: Lin’s statistics and linear regression.

Validation with Lin’s statistics

Lin’s statistics were used to assess the intra-rater reliability, defined as the agreement between the visual estimates and the actual severity as evaluated without and with the scale by the same evaluator. The measure is also called “repeatability” (Madden et al. 2007). Three parameters were calculated: Lin’s concordance correlation coefficient, bias and the Pearson correlation coefficient.

Lin’s concordance correlation coefficient (LCC) measures the extent to which two sets of observations align on the line of concordance (45°) (Barnhart et al. 2007; Lin et al. 2002) and combines the precision and accuracy of each evaluator (Barnhart et al. 2002; Nita et al. 2003). The equation can be represented as ρc = Cb ×  r, where r represents the correlation coefficient (which indicates precision) assessed by the Pearson’s moment correlation coefficient and Cb is the bias coefficient. The bias coefficient represents the accuracy and measures how far the fitting line is from the concordance line. When Cb = 0, a perfect match between the estimates and the actual severity has been obtained. The bias coefficient can be calculated as:

$$ {C}_b=\kern0.5em \raisebox{1ex}{$2$}\!\left/ \!\raisebox{-1ex}{$\left(\omega +1|\omega +{\upsilon}^2\right)$}\right. $$

where quantity ω is the scale shift and υ is the location shift relative to scale. The scale shift is obtained by ω = σy/σx, where σy and σx are the standard deviations of the evaluator disease estimates and the actual severity, respectively. The location shift relative to scale is calculated as:

$$ \upsilon =\frac{\left|{\mu}_{y-{\mu}_x}\right|}{\sqrt{\sigma_y}\times {\sigma}_x} $$

where μy and μx are the means of the evaluation and the actual severity, respectively. Perfect agreement of the estimates is found when ρc = 1, Cb = 0, the scale shift = 1 and the location shift = 1. Different values can be interpreted to indicate bias, imprecision and loss of accuracy (Lin et al. 2002; Nita et al. 2003).

The interrater reliability represents the agreement between evaluators in estimating the disease severity on the same leaf. In other words, it represents reproducibility (Madden et al. 2007). The overall concordance correlation coefficient (OCCC) was used to calculate the agreement between multiple evaluators (Barnhart et al. 2002, 2007) and was calculated for the IN and EX groups without and with the scale.

Validation with linear regression

The regression analysis was performed based on the visual estimates without and with the proposed scale and the actual severity. The accuracy and precision of the estimates were calculated. The accuracy was calculated by linear regression using the image severity as the independent variable (x), and the visual estimates as the dependent variable (y).

The parameters were tested by a t-test applied to the intercept (β0) and the angular coefficient (β1). The null hypothesis is that β0 = 0 and β1 = 1, with a significance level of 0.05. Intercept values (α) greater than zero (>0) indicate an estimated severity above the actual severity, and values below zero (<0) indicate an estimated severity below the actual severity (Nutter and Schultz 1995). For angular coefficient values, those near negative one (−1) and positive one (+1) were considered underestimated and overestimated, respectively.

The precision was estimated by the coefficient of determination (R2) and the absolute error, as calculated by the differences between the visual estimate and the image analysis (Campbell and Madden 1990; Kranz 1988; Nutter and Schultz 1995).

Data analysis

The analyses were performed with R software (R core team 2017) using the “epiR” package (Stevenson et al. 2017). The graphs were plotted with ggplot2 (Wickham 2016).

Results

Development of scale

The actual severity obtained from 450 adult leaves in fields ranged between 0.5 and 68.5%. Based on Webber-Fechner’s law of visual acuity, eight levels were determined: 0.5, 1, 3, 6, 11, 18, 37 and 67% (Fig. 1).

Fig. 1
figure 1

Diagrammatic scale for evaluating the severity of bacterial leaf streak caused by Xanthomonas vasicola pv. vasculorum on corn leaves. The severity levels were 0.5%, 1%, 3%, 6%, 11%, 18%, 37% and 67%

Intra-rater reliability

Without the use of the proposed scale, the t-test of linear coefficients consistently indicated variations in the evaluator estimates (P value <0.01). Overall, inexperienced evaluators overestimated the severity values, and experienced evaluators underestimated the disease severity without the proposed scale (Table 1). In general, inexperienced evaluators overestimated (β0 = 5.7) the disease severity without the scale, and experienced evaluators underestimated (β0 = − 2.1) the disease severity without and with the scale.

Table 1 - Linear regression coefficients of the estimated severity of bacterial leaf streak in corn leaves evaluated by 10 evaluators with and without the proposed scale

The estimates of the inexperienced evaluators varied between −0.23 and 17.15, and those of the experienced evaluators varied between −7.59 and 5.6. The regression coefficient (β1) was significantly different from 1 for all evaluators, ranging from 0.58 to 1.60 (Table 1). The coefficient of determination (R2), which explains the variation in the assessments, of the IN and EX evaluators without the scale was 0.66 and 0.81, respectively. With the use of the proposed scale, the average accuracy was 0.88 and 0.97 for IN and EX, respectively.

The values of the intercept (β0) were not significant, except for those of evaluators III and X. Evaluators I, V and VI gave estimates that were higher than the actual severity, with intercepts (β0) equal to 0.86, 1.46 and 0.08, respectively. The scale aid coefficient (β1) was significant (p < 0.01) for all the evaluators.

R2 - Determination coefficient values for the regression analyses.

β0 - Intercept of the linear regression.

β1 - Slope of the linear regression line.

Lin’s concordance correlation coefficient (ρc) showed a concordance varying from 0.4 to 0.92 without the scale and 0.86 and 0.98 with the scale. The results showed improvement in the agreement between estimated and actual severity with the scale. The EX evaluators showed higher agreement means with the scale, at 0.99 and 0.98 for EX and IN, respectively. The accuracy (Cb) of estimates varied from 0.68 to 1.00 and 0.94 to 1.00, without and with the scale, respectively (Table 2). Perfect accuracy, meaning that the fitting line does not deviate from the 45° line of concordance, is obtained when Cb = 1. Total agreement occurs when ρc = 1.

Table 2 Lin’s concordance coefficients of the estimated severity of bacterial leaf streak in corn leaves evaluated by 10 evaluators with and without the proposed scale

E.shift = Scale shift relative to the perfect match (1 = perfect match between x and y).

L.shift = Location shift relative to the perfect match (0 = perfect match between x and y).

Cb = Bias correction (Cb) measures how much the best-fit line deviates from the 45° line. No deviation from the 45° line occurs when Cb = 1. Cb is a measure of accuracy calculated from the scale shift and the location shift.

Pc = Concordance correlation coefficient (Pc) described by Lin (1989) that combines precision and accuracy to measure agreement with the true values.

Inter-rater reliability

The inter-rater reliability was measured by correlations between pairs of raters (Figs. 2 and 3). The results showed a higher relationship between evaluators when the scale was used. The coefficients of determination were 0.43 and 0.97 without the scale and 0.84 and 0.99 with the scale for the IN and EX groups, respectively. The results suggest that the concordance between evaluators improved with the use of the scale.

Fig. 2
figure 2

Pearson correlation coefficients of the visual estimates of bacterial leaf streak severity on 50 leaves of corn from paired evaluators without the proposed scale aid

Fig. 3
figure 3

Pearson correlation coefficients of the visual estimates of bacterial leaf streak severity on 50 leaves of corn from paired evaluators with the proposed scale aid

The overall concordance correlation coefficient (OCCC) showed higher concordance between experienced evaluators without and with the scale, at 0.73 and 0.97, respectively. The scale also improved the precision and accuracy in comparisons between all evaluators (Table 3).

Table 3 - Interevaluator reliability (reproducibility) of visual estimates of target spot severity on 50 leaves of maize by 10 evaluators measured by the overall concordance correlation coefficient

OCCC - Overall concordance correlation coefficient.

The results suggest that the scale helps to promote standardization of the assessment among evaluators. The agreement among the estimates from experienced evaluators was more improved by the scale than that among inexperienced evaluators.

Discussion

Without the use of the proposed scale, experienced evaluators underestimated the disease severity, and evaluators without experience overestimated it. The same was not observed by (Godoy et al. 2006), in which experienced and inexperienced reviewers overestimated the severity of Asian soybean rust. Evaluators also overestimated corn white spot severity (Capucho et al. 2010),brown eye spot in coffee fruits (Azevedo de Paula et al. 2016) and cotton target spot (Fantin et al. 2018).

Using the proposed scale, only evaluators III and V underestimated the severity of the disease. Evaluators also underestimated brown spot lesions in coffee fruits using a scale (Azevedo de Paula et al. 2016). In general, there was an improvement in accuracy and a reduction in systematic evaluation error with the use of the scale. In addition, the agreement of evaluations among evaluators also improved by standardizing the estimates of the evaluators. In the work of Capucho et al. (2010), the estimations of 70% of evaluators of white spot severity in maize showed variations without the use of a scale. In this study, with the aid of the scale, the estimates of all the evaluators were consistent.

Errors in evaluations may be related to the intrinsic characteristics of the assessed disease as well as the evaluation method. Kranz (1988) reported that the lesion size and shape, the number of lesions per area and the coloring can influence estimates. In addition, the actual severity level also affects the accuracy of the estimates. Bock et al. (2009) observed that the error in the estimates varied according to the actual severity. In the evaluation of foliar symptoms of citrus canker, the evaluators underestimated the actual severity (which was between 5 and 10%) by 10%. dos Santos et al. (2010) observed that evaluators overestimated disease severity when the severity was below 20%.Accuracy and concordance improved with increasing leaf disease severity (Bock et al. 2013). In a different study, overestimation was found to be inversely proportional to the logarithm of severity and directly proportional to the number of lesions (Sherwood et al. 1983).

According to Bock et al. (2010) severity is overestimated when there are numerous small lesions. Bock et al. (2008a, b) studied assessments of the incidence (number of lesions), the percentage of the necrotic area and the percentage of the necrotic area plus the chlorotic halo. The results showed that the evaluators tended to overestimate the disease severity and not the number of lesions. When there are more small lesions, the error of the evaluators is more pronounced compared to that when there are fewer lesions (with the same severity in both cases). For large lesions, the estimation accuracy is higher than that for smaller lesions (González-Domínguez et al. 2014). In the evaluation of soybean target spot, Godoy et al. (2006) and Celoto and Papa (2010) observed that the overestimation of severity was not pronounced.

All these works are related to diseases that present circular lesions. For striated lesions, characteristic of bacterial leaf streak of corn, the results suggest that the accuracy of the evaluators was compromised, since the presence of the chlorotic halo makes the estimation difficult. Bock et al. (2008a, b) observed that reproducibility in the estimates was compromised when the necrotic plus chlorotic area was evaluated. Reproduction was improved when only the necrotic area was evaluated. On the other hand, the presence of a halo in the characteristic symptoms of cotton stain did not compromise the evaluators’ precision or accuracy (Fantin et al. 2018). The authors mention that the agreement between the experienced evaluators was excellent.

According to Belan et al. (2014), the presence of a halo compromises the evaluation due to subjectivity in the assessment of the differences between shades of yellow and green. In addition, Azevedo de Paula et al. (2016) noted that brown spot evaluation in red grains is more difficult than in yellow grains. Bardsley and Ngugi (2013) noted that early education on disease symptoms improved the accuracy of inexperienced evaluators in evaluating the severity of bacterial spot on peach and nectarine.

According to Bardsley and Ngugi (2013), Bock et al. (2016), Chiang et al. (2016) and Nutter and Schultz (1995), evaluator training can reduce errors in severity estimation. Bardsley and Ngugi (2013) tested evaluators with different levels of experience and found that the degree of experience affected accuracy, rather than reliability, in estimates. The authors noted that the assessment could be done by inexperienced and experienced evaluators with sufficient instruction. In addition, the inter-rater reliability was higher when the rater used the proposed scale, demonstrating that the scale improved rater estimate agreement. This means that evaluating the same leaf with a scale may result in closer estimates by different evaluators. The same results were found by Yadav et al. (2013), Azevedo de Paula et al. (2016) and Dolinski et al. (2017). Yadav et al. (2013) noted that the scale provides a method for evaluators to evaluate disease more uniformly than without the scale.

In the present study, without the use of the proposed scale, the agreement between inexperienced evaluators was lower than that between experienced evaluators, confirming the previous premise. The agreement between evaluators was higher with the use of the scale, demonstrating the standardization of estimates among the evaluators. The proposed diagrammatic scale for the evaluation of the severity of bacterial leaf streak improved the accuracy, precision and reproducibility of the evaluations, proving it to be an important tool for the evaluation of this disease in corn. Accurate and reproducible estimates of disease severity prevent problems associated with less appropriate disease control strategies. Moreover, the agreement among the estimates of different raters standardizes the assessment and allows the comparison of results under new management strategies, such as the selection of tolerant cultivars, among others.

Conclusions

The use of the diagrammatic scale to evaluate the severity of bacterial leaf streak contributed to the precision and accuracy of the evaluation of corn disease and improved the inter- and intrarater reliability of experienced and inexperienced evaluators.