Fruit detection in natural environment using partial shape matching and probabilistic Hough transform

Lin, Guichao; Tang, Yunchao; Zou, Xiangjun; Cheng, Jiabing; Xiong, Juntao

doi:10.1007/s11119-019-09662-w

Fruit detection in natural environment using partial shape matching and probabilistic Hough transform

Published: 24 April 2019

Volume 21, pages 160–177, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Precision Agriculture Aims and scope Submit manuscript

Fruit detection in natural environment using partial shape matching and probabilistic Hough transform

Download PDF

Guichao Lin^1,2,
Yunchao Tang³,
Xiangjun Zou¹,
Jiabing Cheng¹ &
…
Juntao Xiong¹

2300 Accesses
103 Citations
Explore all metrics

Abstract

This paper proposes a novel technique for fruit detection in natural environments which is applicable in automatic harvesting robots, yield estimation systems and quality monitoring systems. As most color-based techniques are highly sensitive to illumination changes and low contrasts between fruits and leaves, the proposed technique, conversely, is based on contour information. Firstly, a discriminative shape descriptor is derived to represent geometrical properties of arbitrary fragment, and applied to a bidirectional partial shape matching to detect sub-fragments of interest that match parts of a reference contour. Then, a novel probabilistic Hough transform is developed to aggregate these sub-fragments for obtaining fruit candidates. Finally, all fruit candidates are verified by a support vector machine classifier trained on color and texture features. Citrus, tomato, pumpkin, bitter gourd, towel gourd and mango datasets were provided. Experiments on these datasets demonstrated that the proposed approach was competitive for detecting most type of fruits, such as green, orange, circular and non-circular, in natural environments.

Color-, depth-, and shape-based 3D fruit detection

Article 30 March 2019

Shape-based Fruit Recognition and Classification

Automatic Fruit Image Recognition System Based on Shape and Color Features

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The automatic detection of fruit is of great importance for a wide range of applications in the agriculture field. Robots equipped with automatic fruit detection can detect and locate products to harvest automatically, which may significantly reduce labor tensions in countries such as China (Xiang et al. 2014; Zou et al. 2012, 2016; Wang et al. 2016). Automatic fruit detection may also assist in smart quality monitoring during the fruit growing process, as the appearance of the product is a valuable cue—such information can help farmers make useful decisions regarding fertilization or harvesting (Herold et al. 2005). It can also provide appropriate number of fruits in an orchard, which can be used to estimate yield as a necessary step to plan the harvest schedule (Font et al. 2014).

There currently exists a wide variety of color-based fruit detection methods, but challenges, such as low contrast between fruits and leaves, unstable illumination condition and occlusion, probably render them ineffective (Luo et al. 2016; Barnea et al. 2016). As the contour of an object is invariant to color and lighting conditions and remains stable to intra-category shape variation, contour is often employed for object detection (Murillo-Bracamontes et al. 2012; Lu and Sang 2015; Li et al. 2016). The aim of this study was to exploit contour rather than color information for automatic fruit detection. Figure 1 shows the contours marked in different colors. As can be seen, there are two issues to contour-based methods: (1) target fruit contours are often broken into inconsistent fragments which may be connected to non-target fragments. An example is given in Fig. 1c, where the edge fragment of each citrus either contains part of the contour of the leaf or shadow on the fruit surface. (2) A huge proportion of the input image is occupied by non-target fragments, which thus makes valid detection extremely difficult (see Fig. 1d). Inspired by Donoser et al. (2009) who presented partial shape matching (PSM) to identify parts of an edge fragment matching parts of a given reference contour from cluttered background, PSM is applied to detect sub-fragments of interest, then a novel probabilistic Hough transform (PHT) is investigated to aggregate these sub-fragments to obtain fruit candidates, and finally a support vector machine (SVM) classifier is built to exclude false candidates.

Related work

Automatic fruit detection with color cues has been extensively studied over recent decades (Wang et al. 2013; Payne et al. 2014; Bargoti and Underwood 2017; Puttemans et al. 2016; Zhao et al. 2016). Interested readers are recommended to read recent reviews by Kapach et al. (2012) and Gongal et al. (2015). Here, the background material of sub-fragment detection and aggregation, and fruit classification is discussed.

Sub-fragment detection and aggregation

PSM is a commonly used, generalized algorithm (Donoser et al. 2009) to detect sub-fragments matching parts of a given reference contour. Riemenschneider et al. (2010) invented an angular descriptor to represent geometrical properties of the fragment and applied this descriptor to PSM to detect candidate sub-fragments, followed by employing PHT to group these sub-fragments into objects. Ma and Latecki (2011) first utilized lengths and orientations cues to increase the discrimination power of the descriptor, then used PSM to obtain sub-fragments of interest, and finally aggregated these sub-fragments into possible objects by finding maximum weight subgraphs in a graph using the graph clustering method proposed by Liu et al. (2010). Such a length-based descriptor is not invariant to scale change, thus limiting the application of this method. Fan et al. (2015) proposed a variation of PSM by breaking both the edge fragments and reference contour into shorter and continuous sub-fragments between any two neighboring curvature extreme points (CEP), then calculating the similarity between each pair of sub-fragments, and selecting the top 10% best matches as final results. PHT was used to merge these sub-fragments. Unfortunately, this method is not suited to circular fruits as no CEPs exist on their contours.

Generative model, which leverages a learned reference contour, is another popular method (Fergus et al. 2003; Opelt et al. 2006; Lu et al. 2009). Shotton et al. (2008) first learned parts of the reference contour using k-medoids from training samples, then detected them in the input image by applying the oriented chamfer matching algorithm (OCM), and finally used a cascaded classifier to combine fragments. Yu et al. (2017) also utilized OCM to locate sub-fragments of interest and formulated the sub-fragment aggregation problem as minimizing a Markov random field energy solved by the loopy belief propagation algorithm (Pearl 1982). However, fruit contours are too simple to be learned. To avoid complex learning, Su et al. (2015) directly divided a template into overlapped parts, maximized the posterior probability for the fragments in the image and the locations of object center given parts of a template to obtain sub-fragments of interest, and finally applied dynamic programming to assemble these sub-fragments. This method was effective, though the length and overlapping area of the parts must be determined carefully.

Alternative approaches utilize color or texture features of the target fruit to extract fragments of interest, and use circular fitting or circular Hough transform (CHT) to obtain fruit targets (Lu and Sang 2015; Linker 2017; Lu et al. 2018). Lu et al. (2018) applied CHT on hierarchical contours of an input image to detect citrus fruits. Linker et al. (2012) used a k-nearest neighbor (KNN) classifier to detect seed areas, segmented the contours of the seed areas into the arc and non-arc sub-fragments, then grouped adjacent arc sub-fragments into apple targets. This method was modified by Qureshi et al. (2016) to detect mango fruits. Circular models render these methods only applicable, naturally, to circular fruits.

In this research, a generalized algorithm that could deal with most type of fruits, such as circular, and non-circular, in the natural environment was investigated. PSM is first utilized to detect sub-fragments of interest. PHT, which has been well-established and successfully applied in previous studies (Maji and Malik 2009; Ommer and Malik 2009), is then used to assemble candidate sub-fragments.

Fruit classification

A classifier can be used to remove false positives after fragment aggregation. Several methods have been investigated to classify fruits, such as K-means clustering (Wachs et al. 2010), Bayesian classifier (Slaughter and Harrell 1989), KNN clustering (Kurtulmus et al. 2014), AdaBoost classifier (Luo et al. 2016), and SVM (Ji et al. 2012). Ji et al. (2012) had demonstrated the advantage of SVM in classification, therefore SVM is used in the final verification stage.

Materials and methods

Image acquisition

The datasets encompassed six categories of fruits: citrus, tomato, pumpkin, bitter gourd, towel gourd and mango, and comprised a total of 450 images (Table 1). Citrus images were gathered from an orchard in Guangzhou, China on December 2016, tomato, pumpkin, bitter gourd and towel gourd images were taken from the Nansha Base at the Guangzhou Academy of Agricultural Science on November 2017, and mango images were captured in an orchard located in South China Agricultural University, China, by using a consumer digital camera (DSC-W800, Sony Inc., Japan). All images were saved as JPG format at a resolution of 5184 × 3888 pixels. The imresize function with a bilinear interpolation method in MATLAB was performed to resize the images to 480 × 360 pixels to improve the computational efficiency. Ground-truth bounding boxes for each image were provided manually. A reference model for each category was also supplied (see Fig. 2). Examples of the datasets are shown in Fig. 3, and the acquisition scene of the datasets is shown in Fig. 4.

Table 1 The proposed datasets

Full size table

Algorithm overview

The flow chart of the proposed algorithm is shown in Fig. 5, which includes four steps. Edges of an input image are first detected by the gPb detector (Maire et al. 2008) and grouped to form fragments (Kovesi. 2000). Then, a novel shape descriptor is presented to measure the similarity between arbitrary pairs of fragments and applied to PSM for detecting candidate sub-fragments that are similar to parts of a given reference contour. Thirdly, a novel PHT is investigated to aggregate these sub-fragments, thus obtaining fruit candidates. Finally, an SVM classifier trained on color and texture features is used to remove false positives. More details are provided in the following subsection.

It should be noted that as gPb outperforms most edge detectors, such as Canny, Prewitt, and Sobel, in terms of precision and recall (Arbelaez et al. 2011), gPb is used to extract edges from the input image.

Partial shape matching

The aim is to detect candidate sub-fragments that are similar to parts of a reference contour. A rotation and scale invariant shape descriptor is first presented to capture the geometric property of a given fragment and used to measure the similarity between any pair of fragments. This descriptor is then applied to PSM to detect candidate sub-fragments.

Shape descriptor

Let f denote an edge fragment with a sequence of edge pixels $\varvec{f} = \left\{ {\varvec{p}_{1} ,\varvec{p}_{2} , \ldots ,\varvec{p}_{N} } \right\}$, where p_i= (x_i, y_i)^T is the i’th edge pixel on the fragment and N is the number of edge pixels on the fragment. Edge fragment f is smoothed by convolving with a Gaussian kernel with standard deviation σ (He and Yung 2008), which was set to 3 in the experiments. Next, the tangent vector t_i and normal vector n_i of each edge pixel p_i are calculated as follows:

$$\left\{ {\begin{array}{*{20}c} {\varvec{t}_{i} = \left( {\Delta x_{i} ,\Delta y_{i} } \right)^{\text{T}} /\sqrt {\left( {\Delta y_{i} } \right)^{2} + \left( {\Delta x_{i} } \right)^{2} } } \\ {\varvec{n}_{i} = \left( { -\Delta y_{i} ,\Delta x_{i} } \right)^{\text{T}} /\sqrt {\left( {\Delta y_{i} } \right)^{2} + \left( {\Delta x_{i} } \right)^{2} } } \\ \end{array} } \right.$$

(1)

where ∆x_i, ∆y_i are the first-order partial derivatives in the x and y directions, respectively, and can be approximated by finite differences. Based on the tangent and normal vector cues, two N × N matrixes, distance matrix D(f) and angle matrix θ(f), are defined and used as descriptors to encode the geometric structure of fragment f. The element of the ith row and jth column of distance matrix D(f) is defined as the distance from edge pixel p_i to the tangent vector of edge pixel p_j (Fig. 6):

$$\varvec{D}\left( \varvec{f} \right)_{ij} = \frac{{\left| {\left( {\varvec{p}_{i} - \varvec{p}_{j} } \right)^{\text{T}} \varvec{n}_{j} } \right|}}{{\varvec{p}_{i} - \varvec{p}_{j2} }}$$

(2)

where i, j = 1,…, N. As depicted in Eq. (2), because the distance descriptor here utilizes the distance and normal vector properties of a fragment, it is intuitively more discriminative than the descriptors developed by Ma and Latecki (2011) and Fan et al. (2015) that only used the distance property.

The element of the ith row and jth column of angle matrix θ(f) is defined as:

$$\varvec{\theta}\left( \varvec{f} \right)_{ij} = min\left\{ {\theta_{ij} ,\theta_{ji} } \right\}$$

(3)

where θ_ij ∈ [0, 2π] refers to the relative angle at which tangent vector t_j is rotated anticlockwise to be parallel to vector (p_i–p_j) (Tico and Kuosmanen 2003). Because θ_ij considers the signed angle between two orientations, it is more robust than that of Ma and Latecki (2011) which directly uses the absolute angle of the orientations. Another advantage is that θ(f) is invariant to scale and rotation.

Given another edge fragment of length N, denoted as g, an N × N affinity matrix measuring the similarities between the corresponding pairs of elements of the two descriptors is drawn as follows (Ma and Latecki 2011):

$$\varvec{A}\left( {\varvec{f},\varvec{g}} \right) = exp\left( { - \frac{{\left( {\varvec{D}\left( \varvec{f} \right) - \varvec{D}\left( \varvec{g} \right)} \right)^{2} }}{{\sigma_{D}^{2} }}} \right) + { \exp }\left( { - \frac{{\left( {\varvec{\theta}\left( \varvec{f} \right) -\varvec{\theta}\left( \varvec{g} \right)} \right)^{2} }}{{\sigma_{\theta }^{2} }}} \right)$$

(4)

where σ_D and σ_θ are hyper-parameters, which not only play the role of normalization, but also can be used to balance the contributions of the distance and angle descriptors. These two values were set to 0.2 and π/6, respectively, in the experiments. The similarity between edge fragments f and g is defined as:

$$s\left( {\varvec{f},\varvec{g}} \right) = \frac{1}{{N^{2} }}\mathop \sum \limits_{i = 1}^{N} \mathop \sum \limits_{j = 1}^{N} \varvec{A}\left( {\varvec{f},\varvec{g}} \right)_{ij}$$

(5)

Principle of partial shape matching

The purpose of PSM is to identify parts of a fragment matching parts of a given reference contour.

Let M = {q₁, q₂, …, q_M} denote the edge pixel sequence of a given reference contour, where M is the number of edge pixels on this contour; let E = {e¹, …, e^K} be the edge fragments obtained from the input image, where e^k denotes the k’th edge fragment of E with a sequence of edge pixels as $\varvec{e}^{k} = \left\{ {\varvec{p}_{1}^{k} ,\varvec{p}_{2}^{k} , \ldots ,\varvec{p}_{{N_{k} }}^{k} } \right\}$, and N_k is the number of edge pixels on the k’th edge fragment.

To determine all sub-fragments of interest, let $\varvec{e}^{k} \left( {e,l} \right) = \left\{ {\varvec{p}_{e}^{k} , \ldots ,\varvec{p}_{e + l}^{k} } \right\} \subseteq \varvec{e}^{k}$ denote a sub-fragment of e^k, where l is the length. Similarly, $\varvec{M}\left( {m,l} \right) = \left\{ {\varvec{q}_{m} , \ldots ,\varvec{q}_{m + l} } \right\} \subseteq \varvec{M}$ denotes a sub-fragment of M. The similarity between e^k(e, l) and M(m, l) can then be calculated by using Eq. (5), which yields a 3D matrix defined by Ma and Latecki (2011):

$$\varvec{\varGamma}_{1}^{k} \left( {e,m,l} \right) = s\left( {\varvec{e}^{k} \left( {e,l} \right),\varvec{M}\left( {m,l} \right)} \right)$$

(6)

Note that the edge-linking algorithm (Kovesi. 2000) arranged edge pixels into fragments from either clockwise or counterclockwise directions. Consequently, if the order of a candidate sub-fragment is different from that of the given reference contour, there may be a low similarity score in the 3D matrix. Therefore, another 3D matrix is established to resolve this problem:

$$\varvec{\varGamma}_{2}^{k} \left( {e,m,l} \right) = s\left( {\varvec{e}^{k} \left( {e, - l} \right),\varvec{M}\left( {m,l} \right)} \right)$$

(7)

These two matrixes are computed in an efficient manner using integral image (Donoser et al. 2009), which only took a few milliseconds per edge fragment in the experiments. Because triplets {e, m, l} with low similarity scores or short lengths may be related to non-target fragments, $\varvec{\varGamma}_{1}^{k} \left( {e,m,l} \right)$ and $\varvec{\varGamma}_{2}^{k} \left( {e,m,l} \right)$ are set to zero if their scores or lengths are below a user-defined threshold.

Any tuple {e, m} with length l, which has a large similarity value, is highly possible to belong to a valid match. Therefore, the above 3D matrices are compressed into 2D matrices along length l to maximize the similarity:

$$\left\{ {\begin{array}{*{20}c} {L_{1}^{k} \left( {e,m} \right) = \mathop {\text{argmax}}\limits_{l}\varvec{\varGamma}_{1}^{k} \left( {e,m,l} \right)} \\ {L_{2}^{k} \left( {e,m} \right) = \mathop {\text{argmax}}\limits_{l}\varvec{\varGamma}_{2}^{k} \left( {e,m,l} \right)} \\ \end{array} } \right.$$

(8)

where the value of $L_{1}^{k} \left( {e,m} \right)\varvec{ }$ or $L_{2}^{k} \left( {e,m} \right)$ is the length l. Based on Eq. (8), a function assigning a unique length for tuple {e, m} is defined as:

$$L^{k} \left( {e,m} \right) = \left\{ {\begin{array}{*{20}c} {L_{1}^{k} \left( {e,m} \right), if \varvec{\varGamma}_{1}^{k} \left( {e,m,L_{1}^{k} \left( {e,m} \right)} \right) \ge\varvec{\varGamma}_{2}^{k} \left( {e,m,L_{2}^{k} \left( {e,m} \right)} \right) } \\ { - L_{2}^{k} \left( {e,m} \right), else} \\ \end{array} } \right.$$

(9)

where L^k(e, m) is a positive value when the order of sub-fragment $\varvec{e}^{k} \left( {e,l} \right)$ is the same as the order of $\varvec{M}\left( {e,l} \right)$, otherwise it is a negative value. Function L^k can be visualized by a map, as shown in Fig. 7. This map reveals a few isolated regions in L^k, each of which contains a certain number of matches that have similar start points but different lengths. A match with greater length can average out irregularities, so the extremum l as well as its location (e, m) from each region is detected as a candidate match recorded in a list denoted as R = {{e₁, m₁, l₁}, {e₂, m₂, l₂},…}. Another list I with the same length as R is used to record the index of the edge fragment on which a candidate sub-fragment is located. An example of sub-fragments detected by PSM is given in Fig. 8.

Probabilistic Hough transform

As candidate sub-fragments detected by PSM are scattered in the input image, and some false candidate sub-fragments may exist, a reliable fragment aggregation method to form fruit candidates is required. In this study, a new PHT was developed. Unlike traditional PHT that needs to generate codebooks from training samples (Maji and Malik 2009; Chakraborty et al. 2013), the developed PHT casts probabilistic votes directly for the locations of possible fruits based on all the observations, i.e., candidate matches.

Let S(x, O) denote the score of fruit O at location x. Assume that the candidate matches are independent, then S(x, O) can be obtained by accumulating the probabilities p(x, O,{e_k, m_k, l_k}):

$$\begin{aligned} S\left( {\varvec{x},O} \right) & = \mathop \sum \limits_{k} p\left( {\varvec{x},O,\left\{ {e_{k} ,m_{k} ,l_{k} } \right\}} \right) = \mathop \sum \limits_{k} p\left( {\varvec{x},O,\varvec{e}^{{I\left\{ k \right\}}} \left( {e_{k} ,l_{k} } \right),\varvec{M}\left( {m_{k} ,\left| {l_{k} } \right|} \right)} \right) \\ & = \mathop \sum \limits_{k} p\left( {\varvec{e}^{{I\left\{ k \right\}}} \left( {e_{k} ,l_{k} } \right),\varvec{M}\left( {m_{k} ,\left| {l_{k} } \right|} \right)} \right) \times p\left( {\varvec{x},O|\varvec{e}^{{I\left\{ k \right\}}} \left( {e_{k} ,l_{k} } \right),\varvec{M}\left( {m_{k} ,\left| {l_{k} } \right|} \right)} \right) \\ \end{aligned}$$

(10)

As done by Maji and Malik (2009), the prior probability p(e^I{k}(e_k, l_k), M(m_k, |l_k|)) is assumed to be uniformly distributed, meaning that $p\left( {\varvec{e}^{{I\{ k\} }} \left( {e_{k} ,l_{k} } \right),\varvec{M}\left( {m_{k} ,|l_{k} |} \right)} \right) = \frac{1}{\left| I \right|}$, with $\left| I \right|$ the number of candidate matches. In this way, S(x, O) can be reduced to:

$$\begin{aligned} S\left( {\varvec{x},O} \right) & \propto \mathop \sum \limits_{k} p\left( {\varvec{x},O|\varvec{e}^{{I\left\{ k \right\}}} \left( {e_{k} ,l_{k} } \right),\varvec{M}\left( {m_{k} ,\left| {l_{k} } \right|} \right)} \right) \\ & = \mathop \sum \limits_{k} p\left( {\varvec{x}|O,\varvec{e}^{{I\left\{ k \right\}}} \left( {e_{k} ,l_{k} } \right),\varvec{M}\left( {m_{k} ,\left| {l_{k} } \right|} \right)} \right) \times p\left( {O|\varvec{e}^{{I\left\{ k \right\}}} \left( {e_{k} ,l_{k} } \right),\varvec{M}\left( {m_{k} ,\left| {l_{k} } \right|} \right)} \right) \\ \end{aligned}$$

(11)

where p(O| e^I{k}(e_k, l_k), M(m_k, |l_k|)) refers to the probability that {e^I{k}(e_k, l_k), M(m_k, |l_k|)} is a true match. In the experiments, p(O| e^I{k}(e_k, l_k), M(m_k, |l_k|)) was set to 1 as every candidate match {e^I{k}(e_k, l_k), M(m_k, |l_k|)} was obtained quite confidently by PSM (It would be possible to let p(O| e^I{k}(e_k, l_k), M(m_k, |l_k|)) reflect the confidence of a candidate match according to its length, such as p(e^I{k}(e_k, l_k), M(m_k, |l_k|)) = $\frac{{\left| {l_{k} } \right|}}{{\mathop \sum \nolimits_{j} \left| {l_{j} } \right|}}$, as a longer candidate is more likely to belong to a valid match). Thus:

$$S\left( {\varvec{x},O} \right) \propto \mathop \sum \limits_{k} p\left( {\varvec{x}|O,\varvec{e}^{{I\left\{ k \right\}}} \left( {e_{k} ,l_{k} } \right),\varvec{M}\left( {m_{k} ,\left| {l_{k} } \right|} \right)} \right)$$

(12)

where p(x|O, e^I{k}(e_k, l_k), M(m_k, |l_k|)) is the probability of a fruit center location, which can be estimated by accumulating the dissimilarity between the distance from each point on an observed sub-fragment to location x and that from the corresponding point on a reference contour to the center of the reference contour. Due to the fact that the point-to-line distance is more robust than Euclidean distance (Low 2004), the following point-to-line based dissimilarity metric is used:

$$d_{k,\left| j \right|} = \log \left( {1 + \left| {\left( {\varvec{p}_{{e_{k} + j}}^{{I\left\{ k \right\}}} - \varvec{x}} \right)^{\text{T}} \varvec{n}_{{e_{k} + j}}^{{I\left\{ k \right\}}} } \right|} \right) - { \log }\left( {1 + \left| {\left( {\varvec{q}_{{m_{k} + \left| j \right|}} - \varvec{c}_{M} } \right)^{\text{T}} \varvec{n}_{{m_{k} + \left| j \right|}}^{M} } \right|} \right)$$

(13)

where d_k,|j| refers to the dissimilarity between the |j|’th point pair of the k’th match; $\varvec{p}_{{e_{k} + j}}^{{I\left\{ k \right\}}}$ and $\varvec{n}_{{e_{k} + j}}^{{I\left\{ k \right\}}}$ are the location and normal vector of the (e_k+ j)’th point on the I{k}’th edge fragment, respectively; $\varvec{q}_{{m_{k} + \left| j \right|}}$ and $\varvec{n}_{{m_{k} + \left| j \right|}}^{M}$ are the location and normal vector of the (m_k+ |j|)’th point on the reference contour, respectively; c_M is the center of the reference contour. Figure 9 shows the schematic diagram of the developed dissimilarity metric.

Based on Eq. (13), probability p(x|O, e^I{k}(e_k, l_k), M(m_k, |l_k|)) is defined as follows:

$$p\left( {\varvec{x}|O,\varvec{e}^{{I\left\{ k \right\}}} \left( {e_{k} ,l_{k} } \right),\varvec{M}\left( {m_{k} ,\left| {l_{k} } \right|} \right)} \right) = { \exp }\left( { - \frac{{\varvec{d}_{k}^{\text{T}} \varvec{d}_{k} }}{{\left| {l_{k} } \right|\sigma^{2} }}} \right)$$

(14)

where $\varvec{d}_{k} = \left( {d_{k,1} ,d_{k,2} , \ldots ,d_{{k,\left| {l_{k} } \right|}} } \right)^{\text{T}}$; $\sigma$ is an important constant. If σ is too large, the resulting probabilities will be relatively large and similar, thus losing discrimination; if σ is too small, some true-positive matches will have low probabilities due to disturbance. In the experiments, $\sigma$ was set to 0.1.

To minimize the computation time, the range of x for each match (e^I{k}(e_k, l_k), M(m_k, |l_k|)) is limited to a window twice the size of the reference model and centered at the center of e^I{k}(e_k, l_k). A map describing the probability distribution of the locations of fruit centers is obtained after the PHT process is completed (see Fig. 10a). This map is segmented by a given threshold, and a non-maximal suppression procedure is performed to remove repetitive candidates (see Fig. 10b).

Features extraction and classification

Because some false positives exist after PHT, a feature vector for each candidate is extracted and classified via an SVM classifier for removing false positives. Where there is low-contrast between fruit and leaves, such as bitter gourd, texture features are more discriminative than color features. Where there is high-contrast with leaves, such as citrus, color features are preferable. In this context, a color and texture-based descriptor, named color-HOG, is proposed to describe all kinds of fruits. The color component of color-HOG comprises the mean and covariance of the HSV values in a window centered at a possible fruit center. As the covariance is a symmetric matrix, duplicate elements are not included in the color-HOG features. The texture component of color-HOG is histograms of oriented gradients (HOG) (Dalal and Triggs 2005), of which each cell is set to $8 \times 8$ pixels and each block is $16 \times 16$ pixels.

As a comparison, Linker et al. (2012) proposed a similar color and texture-based descriptor comprised of the statistics of RGB values in a square window centered at the pixel of interest. This descriptor is sensitive to illumination change, because varying illuminations would alter the RGB values of objects. The proposed descriptor utilizes the HOG features extracted from the gradient image which is invariant to lighting conditions, and thus relatively discriminative.

Results and discussion

Two quantitative experiments were performed, one for verifying the performance of the color-HOG descriptor, and one for examining the detection performance of the proposed algorithm. All codes were implemented using MATLAB (R2014a) on a computer with 4 GB RAM.

Evaluation of color-HOG descriptor

The color-HOG descriptor was evaluated by comparing two state-of-the-art descriptors, HOG (Dalal and Triggs 2005) and LBP (Ahonen et al. 2009). These three descriptors were utilized to train the SVM classifier, and precision and recall were used to evaluate the classification performance. Both positive and negative samples were obtained manually from one-third of the images of each dataset, among which 75% of samples were used as a training set, and the others were a test set. Note that 10-fold cross validation and a grid search method were utilized to optimize the parameters of SVM. Table 2 lists the experiment results of the three descriptors on the test set, which clearly showed that color-HOG outperformed HOG and LBP in terms of precision and recall.

Table 2 Precision and recall of the color-HOG, HOG and LBP descriptors

Full size table

Evaluation of detection performance of the proposed algorithm

The performance of the proposed algorithm (named “PSMPHT”) was verified and compared with two methods, one was a similar contour-based detection method (Ma and Latecki 2011), and one was a modified version of the developed method by replacing PHT with a graph clustering method (Bulò and Pelillo 2013), named “PSMGC”. A detection was considered correct if the intersection of the detected and the ground-truth bounding box over their union was larger than 50% (Everingham et al. 2015). Precision and recall were used to assess the detection performance. Note that since one-third of the images per category were used to train the SVM, only the remaining images were subject to this experiment.

Tables 3 and 4 list the precision and recall of the three methods. PSMPHT had two best precisions and four best recalls, PSMGC obtained three best precisions and two best recalls, and Ma and Latecki’s method only had two best precisions, i.e., both PSMPHT and PSMGC obviously outperformed Ma and Latecki’s method. The main reason is that PSMPHT and PSMGC use the proposed bidirectional PSM, which searches the sub-fragments of interest along the input fragment from both the clockwise and anticlockwise directions, thus obtaining more confident candidate sub-fragments than the unidirectional PSM used by Ma and Latecki (2011). The developed shape descriptor encodes the tangent and normal properties of the fragment, which also contributes to the favorable performance of PSMPHT and PSMGC.

Table 3 Precision of PSMPHT, PSMGC, and Ma and Latecki’s method

Full size table

Table 4 Recall of PSMPHT, PSMGC, and Ma and Latecki’s method

Full size table

PSMPHT performed better than PSMGC in terms of the overall performance in this study. Because both PSMPHT and PSMGC use the proposed PSM to detect sub-fragments of interest, the performance difference between the two methods mainly lies in the aggregation method as-applied. As each match independently casts votes for the locations of fruit candidates, PHT can generate a multi-peak probabilistic map from which the local maxima are detected as fruit candidates. This characteristic allows PHT to obtain more confident candidates than GC and therefore enabled PSMPHT with larger precision and recall than GC. However, there was an exception. In the pumpkin category, as the pumpkin and its leaves have similar shapes, PHT would detect many false-positive candidates (Fig. 11a). This issue decreased the precision and recall of PSMPHT over the pumpkin category. A failure example is shown in Fig. 11. In general, PSMPHT has better overall performance than PSMGC in natural environments.

The mango dataset results showed that these three algorithms achieved high precision, as the nighttime environment inhibited the effects of illumination changes and thus improved the SVM classifier performance. Additionally, PSMPHT obtained a higher recall than the other algorithms. The main reason was that only one sub-fragment on the mango contour was detected as the mangoes appeared very small in the captured image, so GC could not find reliable maximum weight subgraphs from the constructed weighted graph, but PHT could still cast reliable votes for the fruit location. This result again confirmed that PSMPHT is highly effective.

Several detection results are shown in Fig. 12. These examples intuitively validated that PSMPHT is effective. Some advantages are presented: (a) the proposed algorithm could detect any type of fruits, such as green, orange, circular, non-circular, etc., i.e., it was a generalized algorithm, and (b) the proposed algorithm was robust under complex conditions, such as occlusion and illumination changes (see Fig. 12ai), low contrast between fruits and leaves (see Fig. 12d) and cluttered background. Also, a disadvantage was uncovered: the developed method would obtain overlapping detections (see Fig. 12aii) or miss some true positives (see Fig. 12eiii). A possible reason is that the proposed PHT uses a scale-variant dissimilarity metric to cast probabilistic votes, so PHT would generate multiple local maximums for large-sized fruits or is unable to detect fruits with large scale changes. Therefore, future research will focus on improving the scale invariance of PHT.

Conclusions

This paper presented a framework for detecting a wide array of fruit types—green, orange, circular and non-circular were tested here—in a natural environment. Detection of sub-fragments of interest was performed using a bidirectional PSM technique, which traces the input fragment from two opposite directions to find possible sub-fragments matching parts of a given reference model. These sub-fragments of interest were grouped as fruit candidates using PHT. False positives were excluded by an SVM classifier.

The proposed algorithm was evaluated on image datasets captured in the natural environment. The datasets comprised citrus, tomato, pumpkin, bitter gourd, towel gourd and mango images, i.e., several different types of fruits. For citrus, tomato, pumpkin, bitter gourd, towel gourd and mango categories, the precision of the proposed algorithm was 0.783, 0.848, 0.745, 0.762, 0.807 and 0.919, respectively; the recall was 0.802, 0.582, 0.678, 0.791, 0.754 and 0.825, respectively. The results are comparable to a similar contour-based detection method. In other words, the proposed algorithm has competitive advantages.

PSM and PHT were used for sub-fragment detection and aggregation without necessitating the painstaking design of specific features for each type of fruit. This makes the proposed algorithm a generalized method. PHT utilizes a scale-variant dissimilarity metric to determine the probability value of a vote, so it may fail to detect fruits with large scale change. In the future, a scale-invariant PHT will be established.

References

Ahonen, T., Matas, J., He, C., & Pietikainen, M. (2009). Rotation invariant image description with local binary pattern histogram fourier features. In Proceedings of the 16th Scandinavian Conference on Image Analysis (pp. 61–70). Heidelberg, Germany: Springer.
Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence,33(5), 898–916.
Article PubMed Google Scholar
Bargoti, S., & Underwood, J. P. (2017). Image segmentation for fruit detection and yield estimation in apple orchards. Journal of Field Robotics,34(6), 1039–1060.
Article Google Scholar
Barnea, E., Mairon, R., & Ben-Shahar, O. (2016). Colour-agnostic shape-based 3d fruit detection for crop harvesting robots. Biosystems Engineering,146, 57–70.
Article Google Scholar
Bulò, S. R., & Pelillo, M. (2013). A game-theoretic approach to hypergraph clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence,35(6), 1312–1327.
Article Google Scholar
Chakraborty, B., Gonzàlez, J., & Roca, F. X. (2013). Large scale continuous visual event recognition using max-margin Hough transformation framework. Computer Vision and Image Understanding,117(10), 1356–1368.
Article Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 886–893). Los Alamitos, USA: IEEE Computer Society Press.
Donoser, M., Riemenschneider, H., & Bischof, H. (2009). Efficient partial shape matching of outer contours. In Proceedings of the 9th Asian Conference on Computer Vision (pp. 281–292). Heidelberg, Germany: Springer.
Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2015). The Pascal Visual Object Classes challenge: A retrospective. International Journal of Computer Vision,111(1), 98–136.
Article Google Scholar
Fan, H., Yang, C., & Tang, Y. (2015). Object detection based on scale-invariant partial shape matching. Machine Vision and Applications,26(6), 711–721.
Article Google Scholar
Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 264–271). Los Alamitos, USA: IEEE Computer Society Press.
Font, D., Pallejà, T., Tresanchez, M., Teixidó, M., Martinez, D., Moreno, J., et al. (2014). Counting red grapes in vineyards by detecting specular spherical reflection peaks in RGB images obtained at night with artificial illumination. Computers & Electronics in Agriculture,108(C), 105–111.
Article Google Scholar
Gongal, A., Amatya, S., Karkee, M., Zhang, Q., & Lewis, K. (2015). Sensors and systems for fruit detection and localization: A review. Computers & Electronics in Agriculture,116(C), 8–19.
Article Google Scholar
He, A. X., & Yung, N. H. C. (2008). Corner detector based on global and local curvature properties. Optical Engineering,47(5), 057008.
Article Google Scholar
Herold, B., Truppel, I., Zude, M., & Geyer, M. (2005). Spectral measurements on ‘elstar’ apples during fruit development on the tree. Biosystems Engineering,91(2), 173–182.
Article Google Scholar
Ji, W., Zhao, D., Cheng, F., Xu, B., Zhang, Y., & Wang, J. (2012). Automatic recognition vision system guided for apple harvesting robot. Computers & Electrical Engineering,38(5), 1186–1195.
Article Google Scholar
Kapach, K., Barnea, E., Mairon, R., Edan, Y., & Ben-Shahar, O. (2012). Computer vision for fruit harvesting robots—State of the art and challenges ahead. International Journal of Computational Vision & Robotics,3(1/2), 4–34.
Article Google Scholar
Kovesi, P. D. (2000). Matlab and octave functions for computer vision and image processing. http://www.peterkovesi.com/matlabfns. Accessed 25 August 2017.
Kurtulmus, F., Lee, W. S., & Vardar, A. (2014). Immature peach detection in colour images acquired in natural illumination conditions using statistical classifiers and neural network. Precision Agriculture,15(1), 57–79.
Article Google Scholar
Li, H., Lee, W. S., & Wang, K. (2016). Immature green citrus fruit detection and counting based on fast normalized cross correlation (FNCC) using natural outdoor colour images. Precision Agriculture,17(6), 678–697.
Article Google Scholar
Linker, R. (2017). A procedure for estimating the number of green mature apples in night-time orchard images using light distribution and its application to yield estimation. Precision Agriculture,18(1), 59–75.
Article Google Scholar
Linker, R., Cohen, O., & Naor, A. (2012). Determination of the number of green apples in RGB images recorded in orchards. Computers and Electronics in Agriculture,81(1), 45–57.
Article Google Scholar
Liu, H., Latecki, L. J., & Yan, S. (2010). Robust clustering as ensembles of affinity relations. Advances in Neural Information Processing Systems,23, 1414–1422.
Google Scholar
Low, K. (2004). Linear least-squares optimization for point-toplane ICP surface registration. Technical report, TR04-004, University of North Carolina, Chapel Hill, USA.
Lu, C., Latecki, L. J., Adluru, N., Yang, X., & Ling, H. (2009). Shape guided contour grouping with particle filters. In Proceedings of the IEEE 12th International Conference on Computer Vision (pp. 2288–2295). Los Alamitos, CA, USA: IEEE Computer Society Press.
Lu, J., Lee, W. S., Gan, H., & Hu, X. (2018). Immature citrus fruit detection based on local binary pattern feature and hierarchical contour analysis. Biosystems Engineering,171, 78–90.
Article Google Scholar
Lu, J., & Sang, N. (2015). Detecting citrus fruits and occlusion recovery under natural illumination conditions. Computers and Electronics in Agriculture,110(C), 121–130.
Article Google Scholar
Luo, L., Tang, Y., Zou, X., Wang, C., Zhang, P., & Feng, W. (2016). Robust grape cluster detection in a vineyard by combining the adaboost framework and multiple color components. Sensors,16(12), 2098.
Article Google Scholar
Ma, T., & Latecki, L. J. (2011). From partial shape matching through local deformation to robust global shape similarity for object detection. In Proceedings of the 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 1441–1448). Los Alamitos, CA, USA: IEEE Computer Society Press.
Maire, M., Arbelaez, P., Fowlkes, C., & Malik, J. (2008). Using contours to detect and localize junctions in natural images. Computer Vision and Pattern Recognition,11, 1–8.
Google Scholar
Maji, S., & Malik, J. (2009). Object detection using a max-margin Hough transform. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp 1038–1045). Los Alamitos, CA, USA: IEEE Computer Society Press.
Murillo-Bracamontes, E. A., Martinez-Rosas, M. E., Miranda-Velasco, M. M., Martinez-Reyes, H. L., Martinez-Sandoval, J. R., & Cervantes-De-Avila, H. (2012). Implementation of Hough transform for fruit image segmentation. Procedia Engineering,35(12), 230–239.
Article Google Scholar
Ommer, B., & Malik, J. (2009). Multi-scale object detection by clustering lines. In Proceedings of the IEEE 12th International Conference on Computer Vision (pp. 484–491). Los Alamitos, CA, USA: IEEE Computer Society Press.
Opelt, A., Pinz, A., & Zisserman, A. (2006). A boundary-fragment-model for object detection. In Proceedings of the 9th European Conference on Computer Vision (pp. 575–588). Heidelberg, Germany: Springer.
Payne, A., Walsh, K., Subedi, P., & Jarvis, D. (2014). Estimating mango crop yield using image analysis using fruit at ‘stone hardening’ stage and night time imaging. Computers and Electronics in Agriculture,100, 160–167.
Article Google Scholar
Pearl, J. (1982). Reverend bayes on inference engines: a distributed hierarchical approach. In Proceedings of the Second AAAI Conference on Artificial Intelligence (pp. 133–136). Cambridge, UK: AAAI Press.
Puttemans, S., Vanbrabant, Y., Tits, L., & Goedeme, T. (2016). Automated visual fruit detection for harvest estimation and robotic harvesting. In Proceedings of the Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA) (pp. 1–6). Los Alamitos, CA, USA: IEEE Computer Society Press.
Qureshi, W. S., Payne, A., Walsh, K. B., Linker, R., Cohen, O., & Dailey, M. N. (2016). Machine vision for counting fruit on mango tree canopies. Precision Agriculture,17(3), 1–21.
Google Scholar
Riemenschneider, H., Donoser, M., & Bischof, H. (2010). Using Partial Edge Contour Matches for Efficient Object Category Localization. In Proceedings of the 11th European Conference on Computer Vision (pp. 29–42). Heidelberg, Germany: Springer.
Shotton, J., Blake, A., & Cipolla, R. (2008). Multiscale categorical object recognition using contour fragments. IEEE Transactions on Pattern Analysis and Machine Intelligence,30(7), 1270–1281.
Article PubMed Google Scholar
Slaughter, D. C., & Harrell, R. C. (1989). Discriminating fruit for robotic harvest using color in natural outdoor scenes. Transactions of the ASAE,32(2), 0757–0763.
Article Google Scholar
Su, Y., Liu, Y., Cuan, B., & Zheng, N. (2015). Contour guided hierarchical model for shape matching. In Proceedings of the 2015 IEEE International Conference on Computer Vision (pp. 1609–1617). Los Alamitos, CA, USA: IEEE Computer Society Press.
Tico, M., & Kuosmanen, P. (2003). Fingerprint matching using an orientation-based minutia descriptor. IEEE Transactions on Pattern Analysis and Machine Intelligence,25(8), 1009–1014.
Article Google Scholar
Wachs, J. P., Stern, H. I., Burks, T., & Alchanatis, V. (2010). Low and high-level visual feature-based apple detection from multi-modal images. Precision Agriculture,11(6), 717–735.
Article Google Scholar
Wang, Q., Nuske, S., Bergerman, M., & Singh, S. (2013). Automated crop yield estimation for apple orchards. In J. Desai, G. Dudek, O. Khatib, & V. Kumar (Eds.), Experimental robotics. Springer Tracts in Advanced Robotics (Vol. 88). Heidelberg, Germany: Springer.
Google Scholar
Wang, C., Zou, X., Tang, Y., Luo, L., & Feng, W. (2016). Localisation of litchi in an unstructured environment using binocular stereo vision. Biosystems Engineering,145, 39–51.
Article Google Scholar
Xiang, R., Jiang, H., & Ying, Y. (2014). Recognition of clustered tomatoes based on binocular stereo vision. Computers and Electronics in Agriculture,106, 75–90.
Article Google Scholar
Yu, Q., Wei, H., & Yang, C. (2017). Local part chamfer matching for shape-based object detection. Pattern Recognition,65, 82–96.
Article Google Scholar
Zhao, C., Lee, W. S., & He, D. (2016). Immature green citrus detection based on colour feature and sum of absolute transformed difference (SATD) using colour images in the citrus grove. Computers and Electronics in Agriculture,124, 243–253.
Article Google Scholar
Zou, X., Ye, M., Luo, C., Xiong, J., Luo, L., Wang, H., et al. (2016). Fault-tolerant design of a limited universal fruit-picking end-effector based on vision-positioning error. Applied Engineering in Agriculture,32(1), 5–18.
Article Google Scholar
Zou, X., Zou, H., & Lu, J. (2012). Virtual manipulator-based binocular stereo vision positioning system and errors modelling. Machine Vision and Applications,23(1), 43–63.
Article Google Scholar

Download references

Acknowledgements

This work was funded by a grant from the National Natural Science Foundation of China (No. 31571568), and the Project of Province Science and Technology of Guangdong (No. 2017A030222005).

Author information

Authors and Affiliations

Key Laboratory of Key Technology on Agricultural Machine and Equipment, Ministry of Education, South China Agricultural University, 483 Wushan Road, Guangzhou, 510642, China
Guichao Lin, Xiangjun Zou, Jiabing Cheng & Juntao Xiong
College of Mechanical and Electrical Engineering, Chuzhou University, 1 Huifeng West Road, Chuzhou, 239000, China
Guichao Lin
College of Urban and Rural Construction, Zhongkai University of Agriculture and Engineering, Guang Xin Road, Guangzhou, 510225, China
Yunchao Tang

Authors

Guichao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yunchao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangjun Zou
View author publications
You can also search for this author in PubMed Google Scholar
Jiabing Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Juntao Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yunchao Tang or Xiangjun Zou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, G., Tang, Y., Zou, X. et al. Fruit detection in natural environment using partial shape matching and probabilistic Hough transform. Precision Agric 21, 160–177 (2020). https://doi.org/10.1007/s11119-019-09662-w

Download citation

Published: 24 April 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11119-019-09662-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fruit detection in natural environment using partial shape matching and probabilistic Hough transform

Abstract

Similar content being viewed by others

Color-, depth-, and shape-based 3D fruit detection

Shape-based Fruit Recognition and Classification

Automatic Fruit Image Recognition System Based on Shape and Color Features