Introduction

Decay caused by fungi is among the main defects affecting the post-harvest and marketing processes of citrus fruit. Infected fruit can be neither stored for a long time nor long-term transported during exportation since a small number of decayed fruit can infect a whole consignment. Thus, fungal infections generate great economic losses to the citrus industry if damaged fruit are not early detected, being Penicillium sp. as the fungi that lead to the most post-harvest loses in citrus packinghouses (Eckert and Eaks 1989). In current packing lines, the detection of decayed fruit is made visually by trained operators examining the fruit as it passes under ultraviolet (UV) light. Nevertheless, this method is subjective and potentially dangerous for human skin. The use of automatic machine vision systems is a possible solution for preventing these drawbacks.

Technology based on colour cameras has spread rapidly for the detection of skin damage of fruit and vegetables (Zude 2008; Cubero et al. 2011), being a common technique for the inspection of citrus fruit. For instance, Kondo et al. (2000) studied the possibility of detecting sugar content and acid content of oranges ‘Iyokan’ using a machine vision system and neural networks. Slaughter et al. (2008) developed a non-contact method of detecting freeze-damaged oranges based on UV fluorescence, and López-García et al. (2010) used multivariate image analysis to detect peel diseases in citrus fruit. Nevertheless, decay lesions are difficult to detect using standard artificial vision systems since they are hardly visible to the human eye and, therefore, by standard colour cameras (Fig. 1). Blasco et al. (2007) used visible computer vision to detect different types of damages in citrus fruit including decay by green mould. While the success in other defects was high, the detection of decay was lower than 60 % because the damages caused for this disease in the citrus skin are not clearly visible before sporulation. On the other hand, following the fluorescence technique used in the industry to detect decay by humans, Kurita et al. (2009) tried to detect decay in citrus using two lighting systems (visible and UV) changing between them while the fruit is under the view of the camera.

Fig. 1
figure 1

Sound orange (left) and the same fruit showing decay caused by P. digitatum (right)

Hyperspectral sensors have been used successfully as an alternative to detect non-visible damages on fruit (Lorente et al. 2012). In the particular case of citrus fruit, different works have been carried out to detect decay lesions (Qin et al. 2009, 2012; Gómez-Sanchis et al. 2012). A hyperspectral image consists of a large number of consecutive monochromatic images of the same scene in each wavelength becoming very important to select only those bands with the most relevant information, while discarding those that do not contribute in any significant way to improve the results, containing redundant information or exhibiting a high degree of correlation. There are numerous feature selection methods to reduce dimensionality that retain most of the original information in fewer bands.

For example, Gómez-Sanchis et al. (2008) evaluated four feature selection methods with the aim of selecting an optimal set of wavelengths in the range 460–1,020 nm for detecting decay in citrus fruit. Xing et al. (2005) used principal component analysis (PCA) to reduce data from a hyperspectral imaging system (400–1,000 nm) for detecting bruises on ‘Golden Delicious’ apples. PCA was also used by Liu et al. (2005) to obtain spectral features for the detection of chilling injury in cucumbers imaged using a hyperspectral system (447–951 nm). More recently, Li et al. (2011) have used PCA to select most discriminant wavelengths in the range 400–1,000 nm for detecting various common skin defects on oranges. Partial least squares (PLS) or artificial neural networks (ANN) are another techniques commonly used for feature selection purposes. ElMasry et al. (2008) determined some important wavelengths for detecting bruises in ‘McIntosh’ apples using PLS on hyperspectral images in the range 400–1,000 nm and ElMasry et al. (2009) used ANN to classify apples into injured and normal classes and to detect changes in firmness due to chilling injury by selecting optimal wavelengths.

Objective

The method used by Lorente et al. (2011) to select most spectral relevant features for detecting decay in citrus fruit was based on the area under the receiver operating characteristic (ROC) curve, which is a promising method to measure the quality of a binary classifier. A novel approach was presented to extend its use to multiclass problems, as is the automatic discrimination of decay lesions in citrus fruits, which is a problem still under research and very important from the agricultural point of view since the damages caused by fungi are hardly visible to the naked human eye and standard vision systems and can be quickly spread to other sound fruits during storage. This work aims to compare our novel approach of the ROC feature selection method with other common feature selection techniques for agricultural multiclass classification problems. We use the detection of decay in citrus fruits using hyperspectral imaging as a benchmark problem by selecting an optimal set of wavelengths effective in the discrimination between common defects and decay lesions in citrus fruit. The comparison of different feature selection techniques is aimed at knowing if the ROC method is a promising technique in multiclass classification problems relative to other commonly used methods in terms of classification accuracy.

Material and Methods

Image Acquisition

The hyperspectral imaging system used was based on liquid crystal tunable filters (LCTF; e.g. Lorente et al. 2011). The system consists of a monochrome camera (CoolSNAP ES, Photometrics, Tucson, USA), a lens providing a uniform focus in the working range (Xenoplan 1.4/17MM, Jos. Schneider Optische Werke GmbH, Bad Kreuznach, Germany), and two LCTF (CRI Varispec VIS07 and NIR07, UK) sensitive to the visible (400–720 nm) and NIR (650–1,100 nm), respectively. The scene was illuminated by halogen lamps placed inside an aluminium hemispherical domo.

For hyperspectral images, a total of 240 ‘Clemenules’ mandarins (Citrus clementina Hort. ex Tanaka) collected from a local producer company were used, including 60 without visible damages, 60 presenting external scars, 60 inoculated with spores of Penicillium digitatum and 60 inoculated with spores of Pitalicum italicum. The inoculation was performed using a suspension of spores with a concentration of 106 spores/ml for both fungi, which is sufficient to cause infestation in laboratory conditions (Palou et al. 2001). The images were acquired by presenting manually the damage on the fruit to the camera. A total of 240 hyperspectral images were taken in the range of 460–1,020 nm, with a 10-nm spectral resolution. Each sample pattern in the labelled set consisted of 74 spectral features associated to each pixel (reflectance level for each acquired band—grey level in each monochromatic image—and several spectral indexes) and a class label assigned manually by a human expert. Five different classes were considered in this work: green sound skin (GS), orange sound skin (OS), defective skin by scars (SC), decay caused by P. digitatum (PD) and decay caused by P. italicum (PI).

Feature Selection Methods

The performance of the method based on the area under the ROC curve is compared with other common feature selection methods. The methods included in this comparative study are: correlation analysis (Rodgers and Nicewander 1988), mutual information (Bonnlander and Weigend 1994), Fisher’s discriminant analysis (Venables and Ripley 2002), t test (Li et al. 2006), Wilks’ lambda (Ouardighi et al. 2007), Bhattacharyya distance (Choi and Lee 2003), minimum redundancy maximum relevance difference criterion (MRMRd) (Ponsa and López 2007), minimum redundancy maximum relevance quotient criterion (MRMRq) (Peng et al. 2005) and Kullback–Leibler divergence (Kullback 1987; Abe et al. 2000). These feature selection techniques have been chosen because they are commonly applied to the analysis of hyperspectral imaging in the fields of pattern recognition and remote sensing, although they have not been used before for automatic fruit or vegetable inspection using computer vision. Therefore it will also be studied if they are suitable and accurate methods for this kind of problems.

In order to get a feature selection for each method, two steps were followed: (1) to obtain a ranking of features ordered according to the discriminant relevance of the features and (2) the selection of an optimal number of features from the feature ranking. The feature selection methods and the classification procedure used in this work were implemented using Matlab 7.9 (The Mathworks, Inc., Natick, USA).

  1. Step I

    Obtainment of a feature ranking

The obtainment of a feature ranking for each class is the initial step to follow. The feature selection techniques studied are intended for binary classification problems but this work deals with problems with more than two classes. Therefore, the one vs. all approach (Rifkin and Klautau 2004) is employed to obtain a feature ranking for each class, which maximises the separation between that class and the others. The second step consists in obtaining a single global feature ranking for each method that is achieved from the relevance values corresponding to the partial rankings for each class. These relevance values are weighted in proportion to the relative importance of the class in the problem and combined using Eq. 1.

$$ {\overline r_j} = \frac{{\sum\limits_{{k = 1}}^N {{r_{{jk}}} \cdot {w_k}} }}{{\sum\limits_{{k = 1}}^N {{w_k}} }} $$
(1)

where \( {\overline r_j} \) is the global relevance of feature x j , N is the number of different classes, r jk is the relevance value of feature x j from the partial ranking for the kth class, and w k is the weight for the kth class.

After obtaining the global relevance of each feature, each input feature is ranked.

  1. Step II

    Selection of an optimal number of features

Once the global feature ranking has been obtained, a minimum number of features leading to a saturation trend in the success rate of classification is chosen for each method. The success rate is calculated using the first features in the ranking, then successive features are added in an iterative process until the increment of the success rate is lower than a certain threshold (1 %). The n features that satisfy this condition are then selected.

Area Under ROC Curve

The ROC curve is a graphical plot of the true-positive rate vs. false-positive rate for a binary classifier, as its discrimination threshold is varied; this value being defined as that from which a positive class prediction is made (Fawcett 2006). The area under a ROC curve (AUC) is used as a global measure of classifier performance that is invariant to the classifier discrimination threshold and the class distribution (Bradley 1997). Maximum classification accuracy corresponds to an AUC value of 1, while a random guess separation involves an AUC value of 0.5. Basically, the ROC feature selection method for binary classification problems consists in calculating a z statistic from the discriminant relevance of each feature x j , defined as the difference between the AUC of a classifier using all the features (AUC 0) and the AUC of a classifier without taking into account the effect of feature x j (AUC j ) (Serrano et al. 2010).

Classifier

The classifier used in this comparative study is a multilayer perceptron (MLP) with a single hidden layer, being a type of ANN (Plaza et al. 2009). MLP can use a wide range of learning techniques for determining the network parameters, the most commonly used being backpropagation. In these classical learning methods, the parameters of the ANN are usually tuned iteratively, thus entailing several disadvantages, such a high computational complexity and convergence to local minima (Shih 2010). To avoid this, the MLP used in this work avoids these problems by being trained using extreme learning machine (Huang et al. 2006), in the same way as that used in Lorente et al. (2011), which is a new learning algorithm that determines the MLP parameters analytically in a faster way instead of tuning them iteratively providing a good generalisation performance at an extremely fast learning speed.

Approaches to the Problem of Decay Detection

In this work, three different approaches to the problem of the decay detection in mandarins are considered, depending on the number of classes implicated and the importance of each class (Lorente et al. 2011). The approach I involves the five classes described in the labelled set, all of them having equal importance or weight. Therefore, the weights of all the classes were considered to be equal when obtaining the global relevance.

It is, however, realistic to assume that the classes belonging to decaying skin should be more important for decay detection. Hence, approach II gives more importance to decay classes (w PD = w PI = 15), medium to the scar class (w SC = 5) and less to sound classes (w GS = w OS = 1). Furthermore, since the actual objective of a potential inspection system would be to detect decay, it is also important to study the detection of just infected fruit, leading to a binary problem: the separation between infected or not infected fruit (approach III).

Methodology of Comparison

Two different tests were carried out in order to compare the different selection techniques with the ROC feature selection method. The comparison, in both tests, is based on the performance evaluation of the classifier using the different sets of features provided by the methods. The first test (test I) consists in selecting an optimum number of features for each method and for each approach. Therefore, for each method, a different number of features that maximises the classification will be obtained. A different way to make the comparison is using a fixed number of features for all methods (test II). For this test, we have chosen the number of features obtained for the ROC method for each approach.

Results and Discussion

The classification obtained using the ROC method is in general better than those obtained for the other methods in all cases, but MRMRd and MRMRq using the third approach. These results could be expected since the MRMR criterion is recognised as one of the most powerful techniques for feature selection (Peng et al. 2005; Ponsa and López 2007). The success of ROC approach is similar to that obtained using the rest of the methods tested. The differences are not significant and therefore we cannot say that our approach is better than the others in terms of decay detection accuracy. It is, however, important to highlight that the best results are achieved using the ROC method for all tests and all approaches. This result should to be taken into account because it is probably due to the fact that this method not only evaluates the features selection but also optimises the performance of the classifier. Therefore, having similar results, ROC method can achieve slightly better scores.

Table 1 shows the results of the classifier performance evaluation using the different sets of features provided by the feature selection methods, described above, corresponding to the test I. The accuracy, achieved with the ROC method, is higher than that obtained with the other methods, except for MRMR in approach III. However, on one hand, minimal redundancy methods try to extract the features with a high degree of relevance, avoiding those features with redundant information. On the other hand, ROC is a method that provides those bands that were used in a classification problem which fit a classifier in a much robust way in terms of accuracy and significance of the model.

Table 1 Results of the classifier performance evaluation using the features selected by the different methods for each approach, but being possible a different number of features for each case (test I)

In general, the rest of the methods saturate the criterion of success with fewer bands than those selected by the ROC. This, in theory, means that to reach more approximate results than ROC, the number of bands needed by these methods should be higher. Therefore, the test II was used in order to check the performance of the ROC method using the same number of bands, being six for the first approach, seven for the second approach and four for the third one. As shown in Table 2, the ROC feature method provides higher scores than most of the feature selection methods used in this study. As it happens in test I, the only two methods surpassing the ROC are MRMRd and MRMRq for the third approach. This fact shows that, in the most pessimistic scenario for ROC method (permitting an increase of the number of features for the rest of the methods), it obtains better results than the others except in the case of MRMR methods in approach III. Even though the differences with the other methods are small since all of them are good feature selection methods, in the case of the approach II, which is probably the most realistic scenario in the real world, the ROC method is clearly the one that obtains better accuracy.

Table 2 Results of the classifier performance evaluation using the features selected by the different methods for each approach, but always employing the same number of features for each method (test II)

Conclusions

In the first test, the classification average success rate obtained using the ROC method is greater than that obtained for the other methods in almost every case, except for MRMRd and MRMRq using the third approach. When we use the same number of features for all the methods, the ROC feature method provides generally better results than most of the feature selection methods used in this comparative study, being the average success rate for ROC almost always greater than that obtained for the other methods, only being surpassed by the MRMR methods for the third approach.

Therefore, the ROC feature selection method is a suitable feature selection technique that can be applied with success to multiclass classification problems with a huge amount of features such as the segmentation of hyperspectral images to detect decay in citrus fruit, having at least similar results than other recognised feature selection methods but with the advantage of to optimise, by its nature, the performance of the classifier.