Introduction

Date palm is one of the oldest and most important staple crops in the Middle East and North Africa, and its fruit is consumed all over the world. In Oman, approximately 50 % of the total cultivable land is under this crop (Al-Farsi et al. 2005). Oman is ranked among the top ten date producing countries in the world (FAO 2010). Although annual production is high (276,400 tonnes/year), export is relatively low (only 5,000 tonnes/year; Al-Yahyai and Al-Khanjari 2008; FAO 2010). One of the major reasons for the lower export from Oman is inconsistencies in qualities (Al-Marshudi 2002).

Hardness is an important quality attribute in determining consumer acceptability. It is normally considered as a defect related to moisture content. Based on hardness, dates may be broadly classified into: soft, semi-hard and hard (Kader and Hussein 2009; Al-Janobi 1998). Soft and semi-hard dates are mostly consumed by humans or used for several product preparations. But hard dates are mainly used for specific products (such as date powder) or as animal feed.

In dates’ handling facilities, hardness assessment is carried out by human graders either by visual or tactile inspection methods. As human grading is a subjective method, an accurate objective method would be highly beneficial in handling and processing facilities to identify and separate dates based on hardness.

In visual inspection method, the surface texture of the dates is used as a criterion to evaluate the hardness. The grader concentrates on looking for lack of smoothness on the date’s skin. Therefore, in any computer vision-based methods, surface qualities would be potential features for the classification based on hardness.

Several researchers have used computer vision techniques for the inspection and grading of various agricultural and food products (Cubero et al. 2011; Schlüter et al. 2009; Lunadei et al. 2013; Fathi et al. 2011; Mollazade et al. 2012). In general, there are four types of textures used in image analysis: statistical texture, structural texture, model-based texture and transform-based texture. However, statistical texture is most widely used in food industry due its higher accuracy and less computation time (Zheng et al. 2006).

Edge detection in a digital image is the process of identifying and locating sharp discontinuities within the image (Maini and Aggarwal 2009; Gonzalez and Woods 2008). These discontinuities are abrupt changes in pixel intensity which characterize the boundaries within the object (Maini and Aggarwal 2009). There are several edge detection methods available, and variables involved in the selection of an edge detection operator include edge orientation, noise environment and edge structure (Musoromy and Ramalingam 2010; Abdel-Qader et al. 2003).

Publications on date fruit quality evaluation using computer vision are very limited. Al-Janobi (2000) developed computer vision system to grade Saudi dates (Sifri variety) based on color and texture analysis with an average sorting error of 1.8 % using a color camera. Fadel (2007) developed a neural network-based image processing algorithm for classification of five date varieties in UAE with a minimum accuracy of 60 % for Bomaan variety and maximum accuracy of 100 % for Fard variety. Al-Ohali (2011) developed a computer vision-based grading system for dates according to size, shape, flabbiness intensity, and defects using red, green and blue (RGB) images and obtained an accuracy of 80 %. However, there is no documented study to classify date according to the hardness.

The hypothesis at this point is that the surface roughness would create visible signatures on binary images when an edge detection technique is applied to the digital images of the dates. Additionally, high-pass filtering should yield information that concentrates on the surface texture. This work then proposes to use the features extracted from edge detection techniques based on Canny, Sobel, Prewitt and the Laplacian of Gaussian (LoG) edge detection methods and a high-pass filter (HPF) implemented in the frequency domain to correctly classify the hardness of dates from three different regions of Oman.

The objective of this study was to determine the ability of edge detection features in classifying dates based on hardness.

Materials and Method

Sample Collection

Samples of Fard variety (the most processed variety in Oman) were obtained from three major dates growing regions: Al-Batinah, Al-Dakhliah and Al-Sharqiah. The standards for three common grades of dates based on hardness (hard, semi-hard and soft) were discussed and set by the managers from three factories in Oman (Samail Dates Factory, United Date Processing, and Bright Sun Dates). The dates were sorted into three major grades according to the set standard by a group of trained graders in the Bright Sun Dates. Finally, the accuracy of the grades were confirmed by an experienced senior grader in Bright Sun Dates. From each region, a representative sample of 600 dates (200 per grade), in total 1,800 samples, were selected and used in this study.

Instrumental Measurement for Moisture Content and Hardness

Moisture content and hardness were measured for 180 date samples (20 dates/class/region = 20 × 3 × 3 = 180). Hardness was measured with a texture profile analyzer (TPA; Model TA XT2i, Stable Micro Systems, Surrey, England) using the procedure described by Rahman et al. (2012) and is briefly explained here. A plate (7.5-cm diameter) compressed the date sample placed on a fixed table mounted on the test bench. The TPA was calibrated and set to zero by lowering the plate to the surface of the base. The seed was removed and the date was cut in to 15 mm2 piece and placed in the middle of the bench. The plate was moved with pre-test speed of 10 mm/s, test speed of 1 mm/s and post-test speed of 10 mm/s during hardness measurement. The compressor was allowed to descend to a total deformation of 75 % compression for two cycles. The TPA automatically recorded the force time curve which was used to compute the hardness (Rahman et al. 2012; Rahman and Al-Farsi 2005; Al-Rawahi et al. 2006).

The remaining part of the same date was used for moisture content measurement using the procedure explained by Rahman and Al-Farsi (2005). Around 3 g of date sample was dried in an oven (Brabender, Duisburg, Germany) at 105 °C for 24 h. The mass of date sample was taken before and after drying and the moisture content was calculated.

Image Acquisition

By considering the size of the image, data transfer, processing speed and cost, a monochrome camera was selected for this study. A monochrome camera (model: XCD-X700, Sony, Japan) with a charge coupled device (CCD) sensor was used to take the images. The camera acquired uncompressed 8-bit images with resolution 1,024 × 768 pixels. The camera was connected to a computer (dual-core AMD operator™ processor 2220, 2.80 GHz, 3.50 GB of RAM, HP XW 9400 Workstation). Two fluorescent lights (36 W, model: Dulux L, OSRAM, Italy) were used to artificially illuminate the date sample. The distance between camera and sample was kept as 1 m in all experiments in order to simulate the factory feasibilities. Before taking images in each batch, the camera was calibrated with white and black standard color cards (Digital Kard XL, DGK Color Tools, USA). The image of single date fruit was captured remotely through the computer, saved and used for further data analysis.

Edge Detection

Spatial Domain

Different edge detection techniques have been used in various applications (Sharifi et al. 2002). Based on preliminary analysis on representative images, we selected four of them: (i) Canny, (ii) Sobel, (iii) Roberts and (iv) LoG. One of the advantages of using spatial domain edge detection techniques is that the extraction of features can be simplified by the fact that those features will be calculated on binary images.

Frequency Domain

While using high-pass filter (HPF) in the frequency domain, more information can be obtained around the edges, which are different as the edges can cause different shadows. Additionally, using the Fourier transform in the filtering process, the values of the final images are no longer restricted to 256 positive integer values, but more information is obtained from the edge images themselves now that their range is still 0 to 255, but in double precision with different values in between the harsh integer-quantized pixel intensities.

Feature Extraction

Features from Spatial Domain

Figure 1 shows a block diagram of the features extracted from each image. The following seven features were extracted using each edge detection method:

Fig. 1
figure 1

Features extracted from each image (only one branch is shown)

  1. F1:

    Total number of edges using eight connectivity

  2. F2:

    Total area of all edges

  3. F3:

    Area of top 20 % edges in terms of their size using eight connectivity

  4. F4:

    Area of bottom 20 % edges in terms of their size using eight connectivity

  5. F5:

    Total number of edges using four connectivity

  6. F6:

    Area of top 20 % edges in terms of their size using four connectivity

  7. F7:

    Area of bottom 20 % edges in terms of their size four connectivity

The use of eight and four connectivity would yield different values as four connectivity disqualifies adjacent diagonal pixels as being part of the same edge.

Features from Frequency Domain

Eight features were calculated per image: (i) entropy, (ii) variance, (iii) median absolute deviation, (iv) interquartile range defined for the difference between the 75th and the 25th percentiles of the values, (v) median and (vi–viii) central sample moments of order 3 to 5.

Results and Discussion

Moisture Content and Hardness

Figure 2 shows the plot of hardness with respect to moisture content of date samples. Although low moisture dates had higher hardness, significant overlaps were observed among classes. In general, soft dates had the highest moisture content and the lowest hardness, and hard dates had the lowest moisture content and the highest hardness (Tables 1 and 2). However, there were no differences in moisture content between soft and semi-hard dates in Al-Batinah and Al-Sharqiah regions. Similarly, in Al-Dakhliah region, semi-hard and hard dates had the same moisture content. The hardness of hard dates in Al-Batinah was significantly lower than the other two regions (Table 2).

Fig. 2
figure 2

Relationship between hardness and moisture content in three classes of dates

Table 1 Moisture content (% wet basis) of date samples (n = 20/region/class)
Table 2 Hardness (N) of date samples (n = 20/region/class)

Monochrome Images

The monochrome images of the date samples from three regions of Oman are shown in Fig. 3. Although it looks like a pattern between three grades of dates, there were lots of overlaps in visual appearance between regions or replications within each region. For example, from Fig. 3, it is seen that the semi-hard sample of Al-Batinah region coincides with the soft date of Al-Sharqiah and Al-Dakhliah regions in appearance. Even though the date samples belonged to the same variety, they were inherently different in the external properties like color, shape, size and internal properties such as moisture content.

Fig. 3
figure 3

Typical monochrome images of dates from different regions and classes

Figure 4 shows the result using the Canny edge detection operator on three grades of dates. A quick look at these images can briefly suggest that the number of connected components is quite different for each case. There are exactly 234, 151 and 169 edges for hard, semi-hard and soft dates, respectively. Intuitively one would expect that the semi-hard case would have more connected components than the soft case. This would not be necessarily the case for all images, as in the example shown in Fig. 4. Similarly, the results obtained using different edge detection techniques on the selected region of the given date image are shown in Fig. 5.

Fig. 4
figure 4

Canny edge detection for typical dates of three classes

Fig. 5
figure 5

Edge detection of a typical date using different techniques. a Original image, highlighted in the rectangle is the area to be shown for the results of the different edge detection techniques. b Canny. c Sobel. d Prewitt. e LoG

Figure 6 shows the results of filtering the same date image with two different HPFs. The shape is that of a Gaussian curve (Gonzalez and Woods 2008) one with a standard deviation of 0.2 and the other one with 0.1; these values are given in normalized frequency. For this type of features, processing in the frequency domain via HPF such as the ones shown in Fig. 6a and b was used to obtain the images shown in Fig. 6c and d, respectively. Note that the two resulting images are very similar. Thus features extracted from one type of filter would yield very similar features as using the other one.

Fig. 6
figure 6

Image of a typical date after high-pass filtering (HPF). a HPF filter with σ = 0.2. b HPF filter with σ = 0.1. c Filtered image with a. d Filtered image with b

Feature Selection

Spatial Domain

The features were initially evaluated using a sequential feature selection method available in Matlab (Version 7, Mathworks, Natick, MA) that performs tenfold cross-validation with a criterion of misclassified observations. Three types of distance metrics in classification were used to define the criterion: (a) linear that fits a multivariate normal density to each group; (b) quadratic that fits multivariate normal densities with covariance estimates stratified by group and (c) Mahalanobis which uses the Mahalanobis distance (Mahalanobis and Chandra 1936). Figure 7 shows the results of adding more features with respect to a criterion based on classification errors. It may be suggested that selecting more than 15 features would not necessarily yield better results. For this same example, features extracted from the Canny edge detection were favored a total of 18 times per pass in the sequential selection algorithm, and the LoG was favored 14 times, making the possibility of adding more features from more edge detection techniques redundant. For example, the Sobel operator was only chosen six times and Roberts seven times. It can be concluded that using only the features from the Canny and LoG techniques would be enough when using this type of classifications. Notice though that the criterion for all cases is really low for this application. To further confirm the selection process and improve on the final classification method, ANN was used. Figure 7b shows the results when using (i) Canny and LoG, (ii) Canny, LoG and Roberts edge detectors, and (iii) using all features that includes the Sobel detection technique ones. We can see that use of all the features does not necessarily yield the best results and also the use of more features slow down the training of the network as the number of iterations for training (epochs) increases.

Fig. 7
figure 7

Selection of features and classification errors while using spatial domain features. Evaluation of features using sequential feature selection with three classification error metrics. Mean squared error (MSE) versus number of training iterations of an artificial neural network (ANN) while using three sets of features

Frequency Domain

The sequential selection algorithm was used not necessarily as a selection scheme as only eight features were available in this case, but to inquire about using a more stopping frequency coverage with a filter with σ = 0.2 and σ = 0.1 (Fig. 8a). As seen, the coverage of the HPF is not critical. It must be noted that the features corresponding to the 4th and 5th moment were selected last, thus making them candidates for elimination in a final classifier. Figure 8b shows the results when training with all features and excluding the two moments mentioned before. As it can be seen, the ANN performs better with fewer features at the expense of requiring a few more iterations during training.

Fig. 8
figure 8

Selection of features and classification errors while using frequency domain features. a Sequential feature selection results obtained while using two filters (one with σ = 0.1 and the other one with σ = 0.2; no considerable difference exists between both cases). b MSE versus number of training iterations of ANN while using two sets of features

Therefore, for classification, a feature set F s was used (result of sequential selection) that omitted the features from the 4th and 5th moments as well as the Sobel edge detection features.

Classification

The classification of dates based on hardness was carried out in two approaches. In the first approach, all three classes (soft, semi-hard and hard) were treated separately (three-class model). In handling and processing facilities, for certain applications, dates are identified as soft and hard dates. Therefore, in the second approach, the classification performance of the monochrome imaging system was determined for two classes (soft and hard). For this analysis, images of semi-hard and hard dates were grouped together and treated as hard dates.

A two layer feed-forward back propagation neural network was used updating the weight and bias values according to the Levenberg–Marquardt optimization training algorithm (Wilamowski and Chen 1999). Hyperbolic tangent sigmoid transfer functions were used in the architecture. Of the image set, 70 % was used for training and the remaining images were used for independent validation. A hidden layer with 55 neurons was used in ANN.

Three-Class Model

An ambitious attempt to classify the dates into hard, semi-hard and soft dates regardless of the region was carried out first. The overall classification accuracy was 75 % while using single ANN (irrespective of regions) for three-class model. Then separate ANN was developed for each region, and the classification accuracy was determined. As the accuracy was relatively lower for most of the regions or grades, a two-step method was tested. In this method, the most confusing grade was least emphasized in the first step and most emphasized in the second step. This idea of combining two ANNs for a final classification can be thought of as combining sets of different classifiers that has been shown to be an effective way for improving classification results (Breukelen and Duin 1998; Sharma et al. 2006).

The accuracy of two steps ANN for three-class model (region wise) is given in Table 3. The soft dates from all regions were classified with 93 % to 97 % accuracy. The hard dates from Al-Dakhliah region yielded the lowest accuracy of 75 %. The classification of semi-hard grade was emphasized in the second step; however, the classification accuracy was still in the range of 48 % to 74 % only.

Table 3 Classification accuracy (%) for region specific ANN in the three-class models

Two-Class Model

The classification of single ANN (irrespective of regions) yielded an overall accuracy of 87 % for two-class model. The accuracy of region wise ANN is shown in Table 4. Similar to three-class model, soft dates yielded higher accuracy (more than 90 %) than hard dates. The accuracy of dates from Al-Sharquiah was the highest and Al-Dakhilah was the lowest for both grades.

Table 4 Classification accuracy (%) for region specific ANN in the two-class models

In conclusion, the edge detection features yielded acceptable accuracies (more than 80 % in many cases) for the classification of dates based on hardness using monochrome images. The semi-hard grade dates overlapped with other grades and yielded lower classification accuracies. This method has the potential to develop computer vision classification systems that can be used in dates handling and processing facilities. However, the effectiveness if this method must be studied for Fard variety from other growing regions and other varieties.