Introduction

Site-specific weed control and management could economically benefit farmers and consumers without diminishing weed control efficiency (Pinter et al. 2003; Slaughter et al. 2008). Another reason to reduce the amount of applied herbicides is weed resistance to herbicides (Marshall and Moss 2008). Site-specific weed management has reduced herbicide use by 11–90 % without affecting crop yield (Feyaerts and van Gool 2001; Gerhards and Christensen 2003). Weed distribution in fields is non-uniform and confined to patches of varying size in field as well as along field borders (Gerhards et al. 1997; Weis et al. 2008) and, since there is significant variation in weeds also between different fields, the need for site-specific weed monitoring and management is emphasized (Moran et al. 2004).

Non-selective weed detection and control can be implemented by detection of green vegetation (Biller 1998). This approach can be applied to entire fields before crop emergence or between the crop rows after emergence (Moran et al. 1997; Alchanatis et al. 2005). Selective sensing methods are designed to detect the shape of weed leaves against the soil background and, thus, can be applied only in early growing stages when leaves are not overlapping (Weis et al. 2008). Selective ground-based sensing methods can also rely upon the spectral characteristics of the weed as well as of the crop. The reflectance data of canopy will probably include shaded leaves and might include stems, flowers, fruits and a background that is very likely to be soil that might be partly shaded and of different humidity levels.

Plant spectra will mainly be affected by leaf pigmentation in the visible region (400–700 nm) (Yoder and Pettigrew-crosby 1995) while the near infrared (NIR) region (700–1 100 nm) is highly influenced by the leaf or canopy structure that can be affected by phenology as well as species (Gausman 1985). A dicotyledonous leaf has more air spaces among its spongy mesophyll tissue than a monocotyledonous leaf (Raven et al. 2005) of the same thickness and age, resulting in a higher reflectance in the NIR region (Gausman 1985). The red-edge region is the slope connecting the low red and high NIR reflectance values in the spectrum of vegetation and is an important indicator for spectral separation of different plant species (Herrmann et al. 2011; Shapira et al. 2013).

The first step required in order to spectrally distinguish between crops and weeds is to obtain continuous spectra of pure plant for each species or group of species. This can be implemented by using high spatial and spectral resolutions, as shown by Vrindts et al. (2002), whose conclusions demonstrated the need to employ relative reflectance values in order to classify crops and weeds and to minimize the effect of different lighting conditions on the spectral data. Lopez-Granados et al. (2008) classified the ground-level spectral reflectance of wheat, four grass weeds and soil, and concluded that one sampling date per growth season, when phenological distinction is maximal, can provide high quality classification. It is important to mention that relying on phenology for spectral separation will be less efficient in cases when the optimal time for herbicide application precedes the date of maximal phenological variability among crop and weeds. Slaughter et al. (2008) noted, in their review, that the greatest portion of the studies were conducted in ideal conditions, with no overlapping of crop and weeds, and resulted in classification accuracies of 65–95 %. Zwiggelaar (1998) mentions, in his review, that using selected wavelengths for discriminating between crops and weeds in a row environment has not been demonstrated so far, and analyzing images with a limited number of wavelengths might not be sufficient. Okamoto et al. (2007) worked in the visible and NIR regions in order to separate between sugar beet and four weeds, two broadleaf weeds and two grass weeds. The validation results showed 75–97 % success for the five classes for sampled pure vegetation pixels. Predictions for an entire image and ground truth analyses were not mentioned. In other studies that applied hyperspectral cameras, the soil background was excluded and the classifications of crop and weeds were applied to young plants with one or two layers of leaves (Borregaard et al. 2000; Feyaerts and van Gool 2001; Nieuwenhuizen et al. 2010).

The current research used ground-level image spectroscopy data, with high spectral and spatial resolutions, for detecting annual grasses and broadleaf weeds in wheat fields. Specific objectives were threefold: (1) to choose the best class determinations, for this dataset, in order to separate broadleaf weed (BLW), grass weed (GW) and wheat; (2) to find the most important spectral bands needed for this separation; and (3) to examine the potential of using high spectral and spatial resolution ground-level reflectance from the wheat fields to predict categories of wheat and weeds.

Materials and methods

Study area

Field measurements were performed in rainfed as well as irrigated wheat experimental plots in winter 2009 at the Gilat Research Center in the northwest Negev, Israel (31o20′ N, 34o40′ E). This region is defined as semi-arid with a short rainy season (November–April; Har Gil et al. 2011). Soils are Calcic Xerosols with sandy loam texture formed from alluvium and loess on shallow hills with average elevation of 80–150 m above sea level (Kafkafi and Bonfil 2008).

Field work and pre-processing

Ground level images were obtained by the Spectral Camera HS (V10E, Specim, Oulu, Finland), a pushbroom sensor, with 1 600 pixels per line and 849 spectral narrow bands (~ 0.67 nm wide) in the visible and NIR regions. The images were obtained from 2 h before until 2 h after midday in order to minimize changes in the solar zenith angle and shadow effects. Since herbicides are usually applied before closure of the crop canopy (Thorp and Tian 2004), images were acquired 10–54 days after the emergence of the wheat. The growth stages of the wheat were Zadoks 12–47 known as seedling growth to flag leaf sheath opening (Zadoks et al. 1974) and the canopy height was up to 0.35 m. The camera was mounted on a tripod, 1.35 m above the top of the canopy, pointing down to cover an area of 0.5 by 0.5 m delimited by a metal frame at the canopy level (Fig. 1). At this height, the spatial resolution was approximately 0.5 mm. The square frame was divided into 16 equally sized squares to visually estimate the relative coverage of different features, such as wheat, weeds and soil. The assessment was carried out for each of the squares and accumulated, with 6.25 % weight per square, to include the entire area surrounded by the frame. All assessments were performed by the same person. The relative coverage of BLW category included all broadleaf plants in the frame area mainly Chenopodium, Malva, potato, knapweed and chrysanthemum. The relative coverage of GW category included all grasses that are not wheat mainly Lolium rigidum and Hordeum plaucum.

Fig. 1
figure 1

Setup of the hyperspectral camera in a wheat field. In the small frame: an example of a sampled image including broadleaf and grass weeds, soil, and wheat

A Coolpix S10 (Nikon) digital camera, mounted at the same height as the hyperspectral camera, was used to acquire true-color images (RGB). These photos were used as references for the cross-validation classification, as well as for obtaining ground truth. It should be noted that, since the integration time of each scene, resulting from the push broom instrument, was 28–35 s (depending on the frame rate and number of lines acquired), and since all images were acquired in an open field, gusts of wind could influence the relative location of leaves. Consequently, a slight difference might exist between the hyperspectral image and the RGB photo.

The image preprocessing included the subtraction of the sensor electronic noise (dark current) and radiometric correction by the AISATools software (Specim, Oulu, Finland). Then, the images were converted to relative reflectance values by the ENVI 4.3 (EXELIS, Boulder, Colorado, USA) software environment. This process was based on the flat field calibration method by white referencing to a barium sulfate (BaSO4) panel positioned on the frame underneath the camera (Hatchell 1999) as presented in Fig. 1. The barium sulfate panel was prepared by pressing BaSO4 powder into a 20 mm deep round box with a diameter of 52 mm. The powder was smoothened with glass to create a smooth surface. The panel was smoothed in the beginning of every working day and checked after every image. The flat field calibration method was performed for each image by its own white reference; this also provided atmospheric correction. The images were spectrally resampled to obtain 91 bands by averaging the original spectra every 5 nm in the range of 400–850 nm. The images were rectangularly clipped to include only the area within the frame boundaries. In cases where the frame was not parallel to the image borders, the images were clipped to include the maximal area, while the frame itself was not included. 21 images were acquired this way for further processing and statistical analysis.

PLS-DA analysis

The partial least squares discriminant analysis (PLS-DA) applies a partial least squares model to the discriminant function analysis problem in order to allow maximal separation among classes (Musumarra et al. 2004). Since the partial least squares method was not initially designed for classification, it has been rarely used for this purpose. Nevertheless, the PLS-DA method can produce plausible separation (Barker and Rayens 2003). In order to relate the PLS (numerical) to the DA (categorical), in a two classes case, each sample was assigned an arbitrary number that indicated to which class it belonged (Xie et al. 2007). These two arbitrary numbers are the only values acceptable for one artificial variable. In the case of more than two classes to be separated, there is a need for more (equal to the number of classes) binary artificial variables that will indicate to which class each sample belongs (Musumarra et al. 2004).

A total number of 1857 spectra from pure pixel (i.e., containing one class) were arbitrarily selected from the 21 raw images, as presented in Table 1. Each spectra was obtained from one pixel as a vector of reflectance values in all 91 bands. The BLW category included spectra of three species: Chenopodium, Malva, and potato. The GW category included spectra of Lolium rigidum and Hordeum plaucum. The PLS-DA was applied for the categories and not for specific species. Approximately half of the spectra were obtained from sunlit pixels and the rest from shaded pixels. In Table 1, the data is divided into four classes: BLW, GW, soil and wheat with 799, 364, 330 and 364 spectra, respectively. Six models were examined:

Table 1 Distribution of the 1857 selected pixels amongst the classes and images
  • Model #1 separates three classes: BLW, G (including GW and wheat) and soil;

  • Model #2 separates two classes: BLW and G (including GW and wheat);

  • Model #3 separates four classes: BLW, GW, soil and wheat;

  • Model #4 separates three classes: BLW, GW and wheat;

  • Model #5 separates the classes from model #3 and divides them into sunlit and shaded pixels (i.e., eight classes);

  • Model #6 separates the classes from model #4 and divides them into sunlit and shaded pixels (i.e., six classes).

The sample distribution to classes for each of the models is presented in Table 1.

In order to evaluate the relative importance of each band in the chosen PLS-DA model, the variable importance in projection (VIP) after Wold et al. (1993) was computed. The VIP is defined as the summary of the importance for each predictor projections to find a number of principal components of the PLS model (Chong and Jun 2005; Cohen et al. 2010). The VIP values are evaluated by “the higher the better” method where the average VIP = 1 is considered to be the putative threshold since it is the average value of the PLS model predictors’ VIP values. Therefore, in order to separate between wheat and weeds, as well as to determine the most important wavelengths for the separation, each PLS-DA model was cross-validated (including VIP analysis) and the best model performed a prediction for each of the images. This process was applied in a Matlab 7.6 (MathWorks, Natick, Massachusetts, USA) environment by the PLS-toolbox (Eigenvector, Wenatchee, Washington, USA). Building PLS-DA models included pre-processing the X-block by mean centering the data. Mean centering is applied as pre-processing for PLS models by reducing variation within the data (Navalon et al. 1999). It operates by subtracting the mean reflectance value for each wavelength from each reflectance value. Then the data were validated by cross-validation as is customary for empirical models (Borregaard et al. 2000). The cross-validation was using every 10th sample in order to set the number of latent variables to be applied for the model (Nason 1996; Wu et al. 1997). Classification quality assessment methods were applied for these confusion matrices in order to obtain the most suitable model for the prediction of weeds in a wheat field. The chosen model was applied for the prediction of all images. The PLS-DA classification prediction intermediately resulted in a two-dimensional image per class, in which the value of each pixel is the probability of the pixel to belong to this class. A pixel was determined to belong to a certain class if the probability of this class was higher than the others. The threshold for classification was set to be 0.3, meaning pixels with probability values smaller than 0.3 were defined as unclassified. Pixels that were unclassified for all classes were determined as unclassified in the final classification result image. The PLS-DA classification prediction resulted in 21 two-dimensional images, in which each pixel is related to one of the classes or defined as unclassified.

Classification quality assessment

The quality of PLS-DA models was compared based on the cross-validation confusion matrices. Cohen’s Kappa was computed as presented and defined by Cohen (1960) as the proportion of agreement after chance agreement is removed from consideration. Cohen’s Kappa is a unit-less value ranging from 1 for perfect agreement to −1 for complete disagreement. Computation of Cohen’s Kappa is based on a confusion matrix and is presented in Eq. (1):

$$ Kappa = \frac{d - q}{N - q} $$
(1)

where d is the sum of pixels that were correctly classified, q is the sum of each line and column in the confusion matrix all summed to be divided by the total number of samples and N is the total number of samples. The confidence limit (CL) units are percent and were calculated for overall accuracy as shown by Foody (2008) and presented in Eq. (2):

$$ CL = \pm t_{N,d - 1} \sqrt {\frac{{p\left( {1 - p} \right)}}{N - 1}} $$
(2)

where p is the overall accuracy, t N,d1 is the statistical value of a 95 % two-tailed test for d samples, N is the total number of samples, and d is the sum of ground truth pixels that were correctly classified. The CL of the total accuracy can allow comparisons between models that are based on total accuracy and, therefore, show if there is a model that is significantly better or worse than others (Foody 2008). A comparison of the quality of coupled PLS-DA classification models was performed, as mentioned, by Congalton and Mead (1986) and presented in Eq. (3):

$$ Z = \frac{{Kappa_{1} - Kappa_{2} }}{{\sqrt {Var_{1} + Var_{2} } }} $$
(3)

where Z is the normal curve deviation, and if it is > 1.96 or < −1.96, the difference between the confusion matrices is significant at 95 % probability (2.58 and −2.58 are the thresholds for significance at 99 %); Kappa is defined in Eq. (1); and Var is the variance of the confusion matrix, as shown by Hudson and Ramm (1987). The comparison was done for two confusion matrices at a time, and their Cohen’s Kappa and variance were calculated in order to determine if they are significantly different from one another.

The quality of the prediction by the PLS-DA model was assessed by ground truth data. For each spectral image, 50 pixels, not included in the calibration data set, were randomly selected to be ground truth data. These pixels were identified in the digital photos taken in field and have been determined to belong to one of the classes. The classification results, in comparison with the ground truth, were analyzed as confusion matrices. The quality of the classification was assessed by Cohen’s Kappa coefficient, overall accuracy, user’s accuracy and producer’s accuracy for each confusion matrix. The classification quality was assessed for each image separately (i.e., 50 ground truth pixels) and for all the images together (i.e., 1 050 ground truth pixels).

Relative coverage assessment

Relative coverage assessment was performed by three methods: field estimation, counting pixels in the PLS-DA classification results, and by a simple classification decision tree (DT). The field estimation was described earlier, and the classes were BLW, GW, soil and wheat. The final results of the PLS-DA classification for each image were used to count the pixels related to each of the four classes (i.e., BLW, GW, soil and wheat) and to divide this number by the number of pixels in the image in order to obtain relative coverage. The DT classified each image into one of five classes: sunlit vegetation, shaded vegetation, specularly reflected vegetation, sunlit soil and shaded soil. The DT classification was applied in ENVI software environment for all the images and resulted in 21 two-dimensional images, in which each pixel is related to one of the five classes.

The DT, presented in Fig. 2, is based on conditions applied for the relative reflectance values of three narrow bands (i.e., 470, 555 and 670 nm) that reflect differences between the classes. The first condition checks if the reflectance values in bands 470 and 555 nm are both lower than 0.05. In case the condition is fulfilled, the pixel is assumed to be shaded and the next condition will determine if it is soil or vegetation. In case the condition is not fulfilled, the pixel is assumed to be sunlit, and the other two conditions will determine if it is sunlit soil, specularly reflected vegetation, or sunlit vegetation.

Fig. 2
figure 2

Decision tree for separating five classes: sunlit vegetation, shaded vegetation, specularly reflected vegetation, sunlit soil, and shaded soil. Each condition, written in the rectangular boxes, has two options: yes or no. Each option leads to another condition or to a classification product. The ρ stands for the relative reflectance value at the specified wavelength (in nm)

Results and discussion

Figure 3 presents the averaged reflectance values with standard deviation of the eight classes: four classes obtained from pure pixels that were sunlit and four classes obtained from pure pixels that were shaded. Note that, as expected, the range of reflectance values of the sunlit pixels (Fig. 3a) are higher than those of the shaded ones (Fig. 3b). These eight classes were used for testing the different classes by the six PLS-DA models.

Fig. 3
figure 3

The averaged spectra of 8 classes along with the standard deviation: a 4 classes obtained from sunlit pure pixels; and b 4 classes obtained from shaded pure pixels

Tables 2, 3, 4, 5, 6, 7 present the cross-validation confusion matrices of six PLS-DA models. These six models are divided into three couples, each with the same classes with and without soil. In order to find the best model, a comparison of the confusion matrices was computed by the normal curve deviation (Z; Eq. 3) for all model couplings (Table 8). In Model #1, the classes were either broadleaf or grass (including wheat data); therefore, it did not distinguish between crop and weeds. For Model #-2, a soil class was added. These models were analyzed and presented in order to learn more about the importance of soil as a distinguishable class. The overall accuracy of Model #1 is higher than that of Model #2 (Tables 2 and 3), and the difference between these confusion matrices is significant (Table 8). Similarly, in other couples of models, i.e., Models #3 and 4 (Tables 4 and 5) and Models #5 and 6 (Tables 6 and 7), the only difference between them is an additional soil class. The overall accuracy is higher for each model containing a soil class, and the difference between the coupled models is significant, as presented in Table 8. These three couplings show that the total accuracy is significantly better (more than 99 %) when soil is added as a class. Since the canopy in the current study is denser than mentioned in the literature for operating similar sensors (Borregaard et al. 2000; Feyaerts and van Gool 2001; Okamoto et al. 2007; Nieuwenhuizen et al. 2010) and since the current study is expected to be a step towards up-scaling (soil will be included in the larger pixel), the influence of soil on classification quality was explored. It can be assumed that, for the coupled models (i.e., models #1 and #2, #3 and #4, and #5 and #6), the improvement in the overall accuracy for models including the soil class is mainly influenced by the accuracies of the soil classes. This assumption is incorrect based on the user’s and producer’s accuracies in Tables 2, 3, 4, 5, 6, 7.

Table 2 Model #1 cross validation of PLS-DA classification model of three classes: broadleaf weeds, grass weeds and wheat, and soil
Table 3 Model #2 cross validation of PLS-DA classification model of two classes: broadleaf weeds and grass weeds
Table 4 Model #3 cross validation of PLS-DA classification model of four classes: broadleaf weeds, grass weeds, soil and wheat
Table 5 Model #4 cross validation of PLS-DA classification model of three classes: broadleaf weeds, grass weeds, and wheat
Table 6 Model #5 cross validation of PLS-DA classification model of eight classes: broadleaf weeds, grass weeds, soil and wheat, all sunlit as well as shaded
Table 7 Model #6 cross validation of PLS-DA classification model of six classes: broadleaf weeds, grass weeds, and wheat, all sunlit as well as shaded
Table 8 Normal curve deviation (Z) values of coupled PLS-DA classification models

Models #5 and 6 were also analyzed in order to demonstrate the effect of sunlit and shaded pixels. In most of the cases, the sunlit class caused better user’s and producer’s accuracies than the shaded class (Tables 6 and 7). The shaded soil and, to a lesser extent, the shaded GW user’s accuracy values of Model #5, produced values that are similar to the soil and GW classes in Model #3. When comparing the user’s accuracy of sunlit classes from Model #5 to the classes of Model #3, the values are similar. Since the images used for the prediction include 21–65 % shade and shaded vegetation out of the total area of the image (obtained by the DT and not presented), Model #3, including sunlit and shaded pixels together, is a more efficient classifier than Model #5. The distribution of sunlit and shaded pixels is presented in Table 1. Table 8 shows the significant superiority of Model #3 over Models #5 and 6. Model #3 comprises, in addition to wheat and weeds, a class of soil, and each class combines data from sunlit and shaded pixels. Therefore, it was chosen to be the model for applying the prediction by images with the aim of identifying weeds in a wheat field.

It is important to emphasize that, although the cross-validation was obtained by pure pixels (Fig. 3), the spectral samples of vegetation classes were acquired from the canopy, and those of the soil class were acquired next to canopy or in its shade. As mentioned above, leaves are semi-transparent for NIR radiation. Therefore, the spectrum of the soil shaded by vegetation will most likely include vegetation signals; hence, a spectrum from each of the classes also includes the spectral signals of the neighboring objects (Borregaard et al. 2000) e.g., leaves and soil, GW and BLW, and sunlit as well as shaded. These objects can be the same class of the sample, another class, or even a target that is not included in any of the classes (e.g., stems, specular reflection and stones). Therefore, each of the presented models includes some amount of error that was caused by field conditions. Consequently, Model #3, that includes in each of its classes sunlit and shaded spectra, is believed to provide prediction results that are based on real field conditions.

In order to find the important wavelengths for separating the classes in Model #3, the VIP analysis was applied for each of the classes and is presented in Fig. 4. For the three vegetation classes, the most important spectral region is the red-edge with the highest peak at 730 nm, while for the soil, the highest peak is at 685 nm on the edge of the red and red-edge spectral regions. This makes sense since the soil does not absorb radiation in the red region, as vegetation does, due to the photosynthetic process (Gausman 1985) and the 685 nm wavelength is the reflectance depth for the three sunlit vegetation classes (data not presented). This is in agreement with the perfect producer’s accuracy for the soil and sunlit soil classes in Tables 2, 4 and 6. Spectra of shaded soil, in a vegetated area, can include elements of vegetation (e.g., absorption in the red region and enhanced reflectance in the NIR region). This might be part of the reason why the soil and soil-shaded classes do not have perfect user’s accuracy in these three tables. For the wheat and GW classes, besides the red-edge, the important regions are the blue and green ones. In most of the cases, the GW plants, in the field as well as in the digital pictures, have a lighter green hue than the wheat plants that seem to be more bluish. This combination of four wavelengths (i.e., blue, green, red and red-edge) became the most important for separating the four classes of Model #3.

Fig. 4
figure 4

Variable importance in projection (VIP) values of model #3 for the four classes. Note that the threshold for the VIP values is equal to 1

There are few hyperspectral satellites that are active today (e.g., Hyperion and CHRIS with two planned, HyspIRI and EnMAP) and they might not be able to meet the spatial requirements for site-specific weed management. Currently, there are only two operational multi-spectral satellites, WorldView-2 and RapidEye, which can meet the four-band requirements (including a red-edge band) along with high spatial resolution for site-specific agricultural applications. The other two forthcoming satellites are the vegetation and environmental new micro spacecraft (VENμS) and Sentinel-2 to be launched in 2014, at the earliest. There is a need to further explore the influence of reduced spatial resolution on the quality of prediction, in other words, to deal with the mixed pixel issue in order to find out what the ideal pixel size is for weed analysis. This is suggested to be a specific airborne (hyper- or multi-spectral) mission for calibrating several common weeds and crops in relation to the economic benefits of weed control. Although satellites can obtain coverage of several fields in one image, it seems that high spatial resolution that allows identifying early weed infestation, similar to ground level, is beyond the goals of current or near future satellites. Therefore, air- and ground-level application based on similar sensors, with a higher spatial resolution, is also a course to be concentrated on in the near future.

Table 9 presents the total accuracy and its confidence interval, Cohen’s Kappa, and user’s and producer’s accuracies obtained from 21 confusion matrices computed in order to calculate the quality of prediction by Model #3 for the 21 images. The total accuracies range from 54 to 90 % and, when considering the confidence intervals, can range from 40 to 98 %. The highest total accuracies of 88 and 90 % were both obtained in images that do not contain all four classes to begin with, or in which the randomly picked ground truth pixels did not cover all available classes, as can be seen by the 0 values in both user’s and producer’s accuracies. Correlating the total accuracy to vegetation coverage by field assessment, PLS-DA and DT, for the images presented in Table 9, resulted in very weak, non-significant, negative relations with R2 < 0.025. Therefore, it can be assumed that Model #3 predictions are not influenced by the vegetation cover range of the values of the current data. The vegetation cover was obtained by the three methods mentioned above and resulted in 35–95 % by the field assessment, 28–100 % by the PLS-DA model, and 15–99 % by the DT.

Table 9 Ground truth results (50 pixels per image) by model #3, and vegetation coverage for each of the images (entire image)

The confusion matrix for all the ground truth points together is presented in Table 10. The 1050 ground truth pixels were distributed among the classes almost evenly, and the number of unclassified pixels is negligible. Soil is the class whose accuracy of prediction is the highest. BLW can be predicted with 81 % success, but BLW is predicted as BLW only with 52 % success. For the wheat class, it is the other way around as wheat can be predicted with 60 % success, but wheat is actually predicted as wheat with 79 % success. Pixels of BLW, GW and soil that are mistakenly classified are, in most cases, classified as wheat 68, 49 and 21 %, respectively. Therefore, the user’s accuracy of wheat is the lowest. In Table 6, most of the mis-classifications are between shaded vegetation classes and, since in Model #3 the classes include sunlit and shaded data in the same class, this might be the source of the miss-classifications in the vegetation classes presented in Table 10. The 1857 cross-validation pixels (used for calibration) were collected from homogeneous regions and were not picked from stones, stems, specular reflection and upside down leaves, nor those adjacent to leaf edges. The 1050 ground truth pixels were randomly selected and, therefore, might be of a target whose spectrum is similar to more than one class (e.g., pixel on the edge of a leaf). In order to generalize the model, spectral data were collected in several plots and growing stages. Tyystjarvi et al. (2011) obtained the best results for species classification by training and testing leaf florescence measurements that were obtained on the same date. Therefore, it is assumed that the classification results could have been improved if the cross validation was applied to data obtained from one date, one plot, or even one image. However, the applied system must be generalized in the sense of location and unlimited to specific growth stage before crop closure (i.e., the optimal time for herbicide application).

Table 10 Prediction of pixels used for validation from all the images together by model #3 with a confidence interval of ± 2.7 % for the overall accuracy, and Kappa = 0.63

Figure 5 displays two images and their classification results by Model #3. Figure 5a presents an image with the lowest total accuracy and Fig. 5c presents an image with above average values (Table 9). The relative coverage assessed in the field, for both images, shows 30 and 25 % of GW, 40 and 50 % of wheat, 27 and 22 % of soil, and 3 and 3 % of BLW, respectively. The DT analysis resulted in similar relative coverage with differences that are not higher than 4 % for each of the five classes. The quality of the prediction confusion matrices’ comparison resulted in a significant difference (more than 95 %), based on Z = 2.27. The distributions of ground truth pixels in both images are similar, with differences of 1, 3, 6 and 4 pixels for the four classes BLW, GW, soil and wheat, respectively. Both images were acquired in the same field on the same day with less than half an hour difference between acquisitions. Therefore, besides human assessment or other errors or inaccuracies and the limitations of the model itself, the difference between the confusion matrices is assumed to be related to the fact that the model was cross-validated from a variety of spectral, growth stages and environmental conditions combined with infield spectral variability.

Fig. 5
figure 5

a An example of a hyperspectral ground-level image with the lowest total accuracy and b Its classification; c An example of an image with above-average total accuracy and d Its classification

The total accuracy and Cohen’s Kappa of prediction, presented for Model #3, can be explained by the number of cross-validation pixels obtained from the image by 11.3 and 0.2 %, respectively, both not significant. Therefore, it is assumed that the influence of the amount of cross-validation pixels on the Cohen’s Kappa, as well as the total accuracy, is negligible and that Model #3 classification prediction results, for the 21 images, are not influenced by the cross-validation pixels’ distribution among the images. Figure 6 presents the user’s and producer’s accuracy of a class for each image related to the number of cross-validation pixels, of the same class, acquired from each image. The GW and soil classes correctly identified among the pixels classified (i.e., user’s accuracy) as GW and soil, respectively, by Model #3 are significantly influenced by the number of cross-validation pixels acquired in the image by 39 and 22 %, respectively. The BLW class correctly classified among the pixels known to be BLW (i.e., producer’s accuracy) by Model #3 is significantly influenced by the amount of cross-validation pixels acquired in the image by 47 %. When including in the analysis only the images that cross-validation pixels were acquired from and the user’s accuracy, the R2 values of BLW, GW, soil and wheat were 0.01, 0.27, 0.83 and 0.1, respectively. When including in the analysis, only the images that pixels were acquired from and the producer’s accuracy, the R2 values of BLW, GW, soil and wheat were 0.29, 0.06, 0.31 and 0.03, respectively. Comparing these R2 values to the R2 values in Fig. 6 results in a reduction of correlation between the amount of pixels and the user’s as well as the producer’s accuracies for the three vegetation classes. Therefore, the vegetation classification quality is almost not influenced by the distribution of pixels among the images. In the case of soil, the tendency is the opposite and, therefore, it seems that, for soil, it is important to distribute the cross-validation pixels among more images.

Fig. 6
figure 6

The amount of cross-validation pixels, of each class, acquired from each image in relation to: a User’s accuracy; and b Producer’s accuracy, of the class for each image

In order to decide whether to apply weed control, there is a need to know the relative coverage of weeds in the field (Slaughter et al. 2008). Figure 5b and d present examples of classification results by Model #3; such images were used to obtain the relative coverage of each of the four classes. There is a positive, significant correlation between the relative coverage obtained by PLS-DA classification and by field assessment (Fig. 7). BLW and soil that show relatively better user’s accuracy results (Table 9) provide better correlation to field assessment. Therefore, it can be assumed that the relative coverage of the four classes can be assessed by Model #3 with the limitations presented above. On top of human error in assessment and mis-classifications of the model, differences in relative coverage can be partly an outcome of the clipping process. Since the frame that was used for field assessment was not always parallel to the image borders, narrow triangles, in the frame area, were clipped out of the images that were used to predict and ground truth the PLS-DA model.

Fig. 7
figure 7

Relating relative coverage of BLW, GW, soil, and wheat by field assessment and by model #3 classification results

Conclusions

Classification of wheat and weeds was the aim of the current study. Pixels were selected from ground-level images and used to build six PLS-DA models. The best model was found to be the one that included soil as an additional class and combined sunlit and shaded pixels in each of its four classes: BLW, GW, soil, and wheat. The important wavelengths for each of the classes of the best model were obtained by the VIP method. The best model was applied to the images and ground truth was applied using RGB photos. The total ground truth resulted in an overall accuracy of 72 %. These results indicate that differentiation between wheat and weeds is possible using PLS-DA, therefore, potentially contributing to practical site-specific herbicide application. Specific conclusions are:

  • Composition of four classes: BLW, GW, soil and wheat, was the best for weed detection for the current data set.

  • Sunlit vegetation can be better separated into classes than shaded vegetation.

  • The red-edge is the most important region for separation among wheat, BLW and GW. For wheat and GW, the blue and green regions are also important, respectively. For soil separation among vegetation, the edge between the red and red-edge is most important, spectrally.

  • Although the model’s cross-validation and ground truth were acquired for heterogenic data, the model obtained reasonable results and, therefore, is potentially applicable.

  • High spectral and spatial resolutions can provide separation between wheat and weeds based on spectral data alone.

Future work that is more aimed at economic thresholds for applying weed control should be concentrated in one of two directions: covering wider areas, with coarser spatial resolution, by satellites (e.g., WorldView-2, RapidEye, VENμS and Sentinel-2) while exploring the mixed pixel issue; or covering relatively smaller areas, with fine spatial resolution, by ground level sensors in order to deal with the sunlit and shaded parts in the canopy and soil separately or to better understand their mutuality.