Keywords

12.1 Introduction

Up to today, majority of tree fruit crop production operations highly depend on seasonal human labor. Many critical activities are not only labor-intensive, but also highly time-sensitive. With the increasing concerns on the labor shortage and associated high labor cost, harvesting as the most labor-intensive operation in tree fruit production has been attracting more and more attention. Improving harvesting efficiency and reducing the dependence on human workers have been the major motivation for developing new harvesting technologies. In recent decades, automation technologies, especially the auto-guidance for field tractors have been investigated widely. However, for specialty crops including tree fruit crops, the application of automatic technologies has lagged due to the complexity of field operations and inconsistency of crop systems. Three different harvesting technologies have been investigated in tree fruit harvesting, including harvest assist platform, massive mechanical harvesting, and robotic harvesting. Harvest assist platforms have significant improvement in harvesting efficiency (Schupp et al., 2011; Zhang et al., 2016), while large amount of human labor is still needed. Mechanical harvesting based on the shake-and-catch concept to conduct massive but non-selective harvesting led to higher harvesting efficiency but may cause more bruise to the fruits (He et al., 2017; Ma et al., 2018). Robotic harvesting as a selective harvesting method is showing potential of replacing human hand picking (Silwal et al., 2016; Hohimer et al., 2019). Two major components with robotic harvesting are fruit detection with machine vision system and fruit picking with robotic maneuvering mechanisms and arms. Tree architecture is another core factor relating to the canopy–robot interaction. With the adoption of narrow tree canopy system especially two-dimensional trellis trained tree systems, robotic harvesting technologies showed more promising compared to the traditional trees. The interaction between harvester and tree canopy requires optimal path planning to avoid obstacles to reach the targeted fruits.

12.2 Tree Fruit Industry and Current Challenges

12.2.1 Overview of Tree Fruit Industry in USA

The tree fruit industry is an important component of the nation’s agricultural sector that contributes about 25% of the market share ($18 billion) among all specialty crops produced in USA (USDA-ERS, 2018). Production of major tree fruits in USA is shown in Fig. 12.1. Citrus fruits are the top fruit crops in world trade in terms of highest worth (FAOSTAT, 2016), are one of the most famous fruit commodities widely accepted for their flavor and nutritional facts. Fresh and processed (e.g., mainly juice) are the two major markets of the US citrus fruits. The fruits mainly used for fresh consumption are grown in California, Arizona, and Texas, where Florida covers almost the entire processed citrus fruit market for orange juice. California produced about 51% of total citrus fruits in the USA in 2018–2019 season where Florida accounted for 44% of the total production and remaining 5% shared by Texas and Arizona (USDA-NASS, 2019a). A total of 7.94 million tons of citrus fruits (valued $3.35 billion) produced in 2018–2019 was 31% higher than 2017–2018 season (USDA-NASS, 2019a). Apples are the second most produced fruits after orange and most valuable fruit crops in the USA. Apples are commercially grown in 32 states, but Washington is by far the largest producer accounting for 70% of the total apple production. New York, Michigan, Pennsylvania, and California are the next four top producers producing a significant amount of apples every year (U.S. Apple Association, 2018). Nearly 7500 growers produced around 4.95 million tons of apple (valued $3.01 billion) on an approximated 130.3 hectare of land in 2018–2019 (USDA-NASS, 2019b). Conversely, pears are mainly grown in six states of USA including California, Michigan, New York, Oregon, Pennsylvania, and Washington. Of these states, California, Oregon, and Washington are producing majority of the pear production every year. Pears contributed $429 million to the economy by producing a total production of 0.8 million tons in 2018–2019 season (USDA-NASS, 2019b). Peaches are the fourth most produced tree fruits in the USA, producing 0.64 million tons in 2018–2019 which is valued $511 million. Peaches are commercially grown in 20 states where California is the largest producer and supplied about 56% of the US fresh peach fruit and nearly 96% of processed peaches (USDA-NASS, 2018). Other top producing states are South Carolina, Georgia, and new Jersey. Contrarily, almost 90% of sweet cherry mainly produced in three states (i.e., Washington, California, and Oregon) and 74% of tart cherry produced by Michigan alone (USDA-NASS, 2018). The US cherry growers produced a total of 0.34 million tons of sweet cherry (valued $638 million) and 0.15 million tons of tart cherry ($57 million) in 2018–2019 (USDA-NASS, 2019b). Despite the significant increasing of production for tree fruits in the past decade because of the proper orchard managements, the fruit industry in USA is facing tremendous challenges due to high dependency on farm labors resulting increasing costs of production (Fennimore & Doohan, 2008; Calvin & Martin, 2010).

Fig. 12.1
figure 1

Production of major tree fruits (per million tons) in USA (USDA-NASS, 2019a, b)

Among the costs associated with production of tree fruits, the harvesting operation (e.g., only picking and hauling) itself is accounting for 11–26% of the total production costs. Cost of harvesting is varying from one fruit to another and also depends on the size of the fruit orchards. Citrus fruits such as orange are costing for $926 per acre for only fruit picking and hauling which is about 11% of the total production cost (University of California Cooperative Extension, 2015). Conversely, the picking and hauling cost for apple is much higher than citrus fruits accounting for 26% of the total production cost where the harvesting cost is $1320 per acre (University of California Cooperative Extension, 2014). Similar to apple harvesting cost, peach requires $1339 for picking and hand sorting of one acre orchard (University of California Agriculture and Natural Resources Cooperative Extension, 2017a, b). Conversely, pear fruit accounts for $1780–$1969 per acre which is about 20–25% of total production cost (University of California Agriculture and Natural Resources Cooperative Extension, 2018). Aside from citrus, apple, peach, and pear, cherry fruit accounts for $720–$960 per acre for picking by using hand (University of California Agriculture and Natural Resources Cooperative Extension, 2017a, b).

12.2.2 Challenges and Opportunities for Fruit Harvesting

Harvesting of tree fruits (i.e., apples, citrus, cherries, peaches, and pears) is the process of gathering a ripe fruit from the orchards which highly depends on labor workforce and is becoming less feasible due to the decreasing trend of farm labor in agriculture with increasing cost of production. Although a rapid development in agricultural automation has been progressed in the twentieth century, tree fruit harvesting is still largely dependent on manual labor due to lack of efficient and effective harvesting methods. Most of the developments reported in the last few years are in prototype phase and not fully feasible to the large scale orchard condition due to lower efficacy and efficiency, unreliable performance, and high cost (Zhao et al., 2016). Among the tree fruits, the apple industry alone is accounting for $1150–$1700 per acre for manual harvesting (e.g., handpicked) by seasonal labors (Gallardo et al., 2010). Therefore, a large number of seasonal workers is required every year for only tree fruit harvesting considered as the top labor-intensive task in orchard management. Fruit growers of Washington State utilized about 36,425 seasonal labors in the peak harvesting month (i.e., September) for only apple harvesting (Washington State Employment Security Department, 2013), accounting for one-third of the annual variable costs combining tree pruning and thinning (Gallardo et al., 2010). Conversely, increasing demand for seasonal workforce in fruit industries is pretending the high uncertainty of the farm labor availability in the near future (Calvin & Martin, 2010). Tree fruit industries in the USA are hiring a major portion of seasonal labors from migrant Latino populations which is also following decreasing trend in the past few years (Gonzalez-Barrera, 2015) gaining serious concern of fruit growers for harvesting in the upcoming years. Contrarily, most of the tree fruits are picked by hand of farm labors using a ladder and bag that pose a high risk of back strain and musculoskeletal injuries because of hand lifting, repetitive hand actions, and awkward postures while picking fruits (Fathallah, 2010). The main reason for the musculoskeletal injury is ascending and descending of ladders with heavy loads. Ladder-caused injuries accounted for about $21 million compensation in the year between 1996 and 2001, which was 50% of all compensations claimed in the fruit industry of Washington State over the time frame (Hofmann et al., 2006). Considering labor injury issues during fruit picking at high locations, labor assist systems (i.e., mechanical platforms) were commercialized that help the pickers by raising up and by raising the bins close to them; however, adoption of these technologies is not widespread among tree fruits growers in the USA (Robinson et al., 2013). A total of nearly 11% fruit growers utilized mechanical labor support systems for harvesting operation in Washington State (Gallardo & Brady, 2015). Contrariness between the mechanical labor assisted systems and the previous orchard design and tree architecture was referred to as the most noteworthy obstacle to their utilization and brought a significant compatibility problem in the tree fruits harvesting (Duraj et al., 2010). To address the challenges associated with labor shortage, risk of labor injuries, limitation of labor assisted systems, and also to reduce the harvesting cost and saving time, the development and application of automatic or robotic harvesting is utmost importance and essential considering innovations in developing advanced sensors, horticultural advancement, and evaluation of mechanical technologies in the past decades. Figure 12.2 illustrates the evaluation of tree fruit harvesting methods from manual picking to robotic harvesting.

Fig. 12.2
figure 2

Illustration of the evaluation of tree fruit harvesting methods, from left to right are: manual picking, harvest assist platform, mechanical shake-and-catch, and robotic harvesting

12.3 Overview of Robotic Harvesting Technologies

Beside robotic harvesting, using harvest assist platforms for harvesting tree fruits can be back to the 1990s. Peterson and Miller (1996) developed a harvest aid by placing two pickers strategically under a tree canopy, whose primary task was to pick and drop apples onto a padded catching surface. The machine was modified for narrow inclined trellises that allowed pickers’ free movement to optimize their picking time, field tests demonstrated the potential to improve worker productivity up to 22% and effectively remove culls in the orchard (Peterson & Bennedsen, 2005). However, apple damage incidence was unacceptably high, requiring refinements on the handling components.

Vibratory or shaking is the most widely used mechanical harvesting method to transmit kinetic energy to fruiting branches, thus to generate a detaching force on the fruit–stem interface and removes fruit from the tree (Erdoğan et al., 2003). During shaking, a tree will respond differently to different excitation frequencies and amplitudes and fruit removal occurs when the induced detachment force exceeds the pedicel fruit tensile strength (Markwardt et al., 1964). Upadhyaya et al. (1981) studied a single degree of freedom model to describe the response of a tree to impact input and found that 50–60% of the mechanical energy was converted to kinetic energy when impact excitation was used. Savary et al. (2010) developed a simulation tool for predicting the interaction between a tree and the shaker using finite element analysis. Experimental results revealed that the resultant acceleration of the tree would increase with the increase of shaking frequency. Du et al. (2012) conducted a series of dynamic tests to find the energy responses of a sweet cherry tree to vibratory excitations in both laboratory and orchard environments. They found that the energy delivery efficiency and its distribution pattern were heavily related to tree structure. Recently, a localized multi-layer shake-and-catch harvesting system was developed and tested in the apple orchards, which found the possiblility of reducing mechanical-induced damage to the fruits (He et al., 2019). While mechanical harvesting is non-selective harvesting and more precise harvesting should be applied, such as robotic harvesting.

12.3.1 Concept of Robotic Harvesting

The use of robots in tree fruit production is primarily associated with decreasing labor availability and increasing associated costs. An agricultural robot can be defined as an integration of sensing, computing, and manipulation systems to execute pre-defined tasks including thinning, pruning, and harvesting (Kondo & Ting, 1998). In the production cycle of the tree fruits, harvesting is the most labor-intensive operation. As fruit harvesting is time sensitive operation, a large seasonal workforce of skilled labor is required, which is a concern for the fruit growers due to decrease in the labor availability. In addition, the harvesting labor accounts for the significant portion of the variable production cost. Thus, robotic harvesting is an alternate solution to address the issue of labor availability and associated costs and timeliness. The robotic harvester can be classified into two categories: bulk (mass) harvesting and selective (ripe/ready) harvesting. The selective harvesting in which only harvesting the ripened fruits received more attention from the researchers. As a result, several robotic tree fruit harvesting systems have been developed for harvesting various types of fruits including apples (Silwal et al., 2016), citrus (Mehta & Burks, 2014), and cherries (Tanigaki et al., 2008), but no commercial success has been achieved yet. With the recent advances in sensing, controlling, and computing capabilities, the robotic tree fruit harvesting is becoming a possible long-term technology to ensure the sustainability of the tree fruit industry. In this section, a general overview of the different components along with some recent efforts for developing an integrated robotic system for tree fruit harvesting is presented, followed by detailed discussion on the core technologies in the next section (Fig. 12.3).

Fig. 12.3
figure 3

Illustration of the Integrated Robotic Tree Fruit Harvester (Apple). Components: (1) manipulator, (2) camera vision system, and (3) end-effector tool (gripper)

12.3.2 Robotic Harvesting Review

In recent years, many researchers have worked on development of integrated robotic harvesting system for different tree fruits including apples, citrus, litchi, and cherry. However, these robotic systems are still in the development phase. A universal robotic system may not be feasible for different tree fruits as the harvesting principles vary for different fruits due to the challenging features, e.g., canopy characteristics, and fruit attributes such as size, shape, and weight. Different robotic systems were developed implementing various combinations of integrating different types of sensing systems with different types of manipulators and end-effectors to facilitate the robotic harvesting for tree fruits. Among the high valued tree fruits, the robotic harvesting of apple has gained more attention. Figure 12.4 shows three different types of apple robotic harvesting systems. The modern tree canopy architecture for apple orchards such as trellis fruiting wall, v-trellis, and tall spindle makes most of the fruit visible and accessible from outside, has encouraged researchers for automated apple harvesting. The features of apple fruit including shape, size, color (esp. red varieties) are easier to detect and the other attributes such as hard nature of apple fruits help robotic harvesting as the end-effector could pick it without damaging/bruising. An apple harvesting robot was developed by Silwal et al. (2016) using a seven DoF robotic system integrated with a three tandem fingers gripper end-effector and over-the-row time of flight-based color camera. For establishing the controls, the developed system used the global view system to take the images at the start of each harvesting cycle. The developed system was able to detect 90–100% of the fruits; however, the harvesting/picking success was 84% with an average speed of 6.1 s per fruit. Onishi et al. (2019) developed a robotic apple harvester using deep learning for fruit detection. The system comprised of a six DoF robotic arm integrated with a stereo camera and gripper end-effector. The system was able to detect 90% of the fruits with average harvesting cycle time of 16 s per fruit. However, the gripper made four turns to twist break the peduncle, resulting in higher harvesting time. Another apple harvesting robot developed by Baeten et al. (2008) consists of a six DoF manipulator integrated with soft gripper end-effector (vacuum operated) having the camera attached in the center (hand-in-eye configuration). The fruit detection accuracy was 80% (diameter range 6–11 cm) and average harvesting speed was 9 s per fruit. However, a better sensing of the environment is essential to avoid the contact of the soft gripper with the sharp limbs. Also, the communication between the vision and control unit could be improved to reduce the harvesting time. Bulanon and Kataoka (2010) developed a prototype for robotic apple harvesting by integrating an RGB camera with a laser sensor. The single fruit detection accuracy was 100% and the picking success was as high as 90%, with an average detachment time of 7.1 s per fruit. However, the study was conducted in laboratory environment, and further investigations are still required to confirm the performance in the field conditions. FFRobotics (2020) developed a commercial robotic apple harvester and claimed to have the fruit detection 95% in high-density orchards with a bruise free fruit picking accuracy as 90%. However, the collision with limbs and trellis wire still needs to be addressed.

Fig. 12.4
figure 4

Example of three robotic apple picking systems. From left to right: FFRobotics (multi-layer linear motion with three-finger gripper), Abundant Robotics (parallel robotic arm with vacuum gripper), and Washington State university (serial robotic arm with three finger gripper)

Some other tree fruits gained attention for robotic harvesting including citrus, cherry, peach, and litchi. Mehta et al. (2014) developed an integrated citrus harvesting robot with a position controller. The system consists of a seven DoF manipulator equipped with a gripper and RGB cameras. The system was able to harvest 95% of the fruits on the tree with harvest cycle as 8 s per fruit. The error in the end-effector positioning was observed less than the fruit diameter, however, with average position accuracy of about ±15 mm, could only be suitable for medium to large size citrus varieties. Harrell et al. (1990) reported the harvesting success rate as 50% with harvest cycle time of 36 s per fruit for citrus harvester. Energid (2020) has developed a prototype for citrus harvesting. The system comprised of two DoF (for aiming and extension) and a camera system (for detection), while no picking end-effector was attached for grasping. The developed prototype was able to pick 50% of the citrus fruit and the average harvesting cycle time was 3 s per fruit. Robotic cherry harvesting has also gained attention of the researchers. Tanigaki et al. (2008) developed a cherry harvesting robot, comprising a four DoF manipulator integrated with a vacuum end-effector and 3D vision sensor having red and infrared laser diodes. For all detected cherries on the tree, the average harvesting cycle time was 14 s per fruit and the harvesting success with and without peduncle attached to cherry was 83% and 66%, respectively. The robot prototype was tested on a model cherry tree in the laboratory, however considering the delicacy of cherry fruit, a more sophisticated end-effector is essential to test the system performance in the field conditions on real trees. Some efforts for the integrated robotic systems for peach and litchi harvested are also reported. Yu et al. (2018) developed a prototype of an autonomous peach harvester. The system consists of a six DoF manipulator integrated with a gripper end-effector and three RGB cameras and a laser sensor for peach detection, measuring distance, and obstacle avoidance. The fruit detection success was 90% with a tracking speed of 40 fps. However, the system was greatly affected by the illumination conditions, which resulted in lower detection accuracy. Similarly, Xu et al. (2011) reported a virtual prototype for litchi harvesting robot consisting of a five DoF manipulator but further research is required for the integrated system development. A summary of the recently developed robotic tree fruit harvester is presented in Table 12.1. The reviewed integrated robotic harvesters for various fruits are still in the development phase. An interdisciplinary approach is needed to address the engineering, horticultural, and economical issues, to make a substantial progress toward the adoption of robotic tree fruit harvesting in the orchard environment.

Table 12.1 Recent developments for tree fruit harvesting robots

Different metrics could be used to determine the performance of the integrated robotic systems. Bac et al. (2014) present eight different performance measuring indicators including fruit localization success, false-positive fruit detection, detachment success, harvest success, harvest cycle time, damage rate, number of fruits evaluated in a test, and detachment attempt ratio. However, in general the performance of the harvesting robots as reported by researchers could be determined using two metrics including: harvesting success, which refers to percentage of the successfully picked from the available total fruits on the tree, and harvesting speed, which refers to the amount of time required to complete the harvesting cycle (sensing, reaching, and detaching) for a single fruit. The integrated harvesting robots developed for different tree fruits greatly differ from each other as the design requirements vary for different fruits, depending on the fruit and canopy characteristics and thus could not be compared directly. However, the metrics used to determine the performance are similar. The figure is presented to better understand the status of harvesting robots for different tree fruits and also provide the understanding on how the fruit and canopy characteristics could affect the harvesting success and harvesting speed.

12.4 Core Technologies in Robotic Harvesting

As shown in Fig. 12.1, the core components of the robot include a camera based sensing system to detect the environment including fruits, leaves, and branches, an efficient computing and processing algorithm to extract the useful information from the environment, a mechanical manipulation system for reaching the target fruit location, an end-effector tool to harvest/pick the target fruit, and a conveyer system to place the harvested fruit into a container/bin. The process of robotic tree fruit harvesting begins from detecting the fruit using a camera vision system and finding the location of the fruit so that the mechanical manipulation system could reach target fruit and an effector tool could detach it from the tree. With advancement in the imaging and sensing and technologies, numerous studies have reported different vision-based techniques for getting useful information for fruit feature extraction including color, size, shape, and texture, etc., localization, and tracking (Silwal et al., 2016; Tabb et al., 2006). Environmental sensing or fruit detection could be done using a single viewpoint or multiple viewpoints, however, the vision system has certain challenges due to various factors including heavy occlusion by the leaves, fruit clustering, unpredicted tasks, unstructured environment, and variable lighting conditions (Zhang et al., 2019). The second step in the robotic fruit harvesting is to approach the fruit using a mechanical manipulation system. This step primarily involves the optimal trajectory planning to position the end-effector at required location and orientation, and sequencing or prioritization of fruit harvesting to minimize the path length, time, energy or parameters that affect the performance of a robot (Silwal et al., 2017). The manipulator degrees of freedom (DoF) is critical for precise positioning and orientation of the end-effector. In general, most widely adopted manipulators for agriculture usually have five or more DoFs. The target could be approached in two different ways. The first way refers as visual surveying, which involves detecting the fruit coordinates in the 2D image and continuously changing the manipulator joint positions to keep the fruit at the same image coordinates at all time (Ringdahl et al., 2019). The second approach is using a global camera system, which involves mounting a camera at a fixed position to take images at the beginning of a harvesting cycle to estimate the position of all fruit in the camera view. The approaching path could be established using the inverse kinematics for each initial and final position of the end-effector; however, an accurate calibration between vision system and manipulator is essential for reaching the target precisely (Zhang et al., 2019).

12.4.1 Machine Vision for Fruit Harvesting

Robotic tree fruit harvesting requires two major tasks to be done; one is to accurately recognize the fruit in the tree and second is to be detaching the fruit without having any damage on it or any particular part of the tree. An illustration of machine vision based automatic tree fruit detection is presented in Fig. 12.5. Machine vision uses advanced sensors (i.e., cameras) that captures the images, processing hardware and software algorithms to automate visual inspection or detection and localization tasks and accurately/precisely guide the end-effectors to successfully harvest the fruits from the tree branches. For robotic fruit harvesting, the fruit automatic detection and localization have been conducted mainly by using machine vision techniques. Camera sensors are used to capture the images from the trees, which is considered as the first step toward fruit detection as well as fruit harvesting.

Fig. 12.5
figure 5

An illustration of machine vision based automatic tree fruit detection

12.4.1.1 Camera Sensors for Fruit Harvesting

Camera is an optical instrument used to record visual important features in the form of image or video signals to distinguish fruits from leaves, trunks, branches, and other neighboring objects in the real-time orchard condition. A camera lens takes all the light beams skipping around and utilizes glass to divert them to a single point, making a sharp picture of the objects. Four types of cameras are used in fruit recognizing so far including black and white, color, spectral, and thermal cameras, and three types of cameras are used for fruit localizing including color, stereovision, and time-of-flight cameras. A color camera uses filtering to look at the light in its tree primary colors including red, green, and blue. After recording all three primary colors, the camera combines them to create the full spectrum. Color camera captures light across three wavelength bands in the visible spectrum (400–700 nm). Spectral camera uses multiple electromagnetic spectrum bands (e.g., near-infrared: 750–900 nm; hyperspectral: 400–1100 nm in steps of 20 nm, and so on) and usually go beyond color camera to collect objects information. Conversely, thermal camera detects temperature by capturing different levels of infrared light using wavelength of 1–14 μm to distinguish between objects. Apart from single camera lens, stereovision camera is consisted of two or more lenses with separate image sensors to see the same object that can provide 3D structure of the object. Conversely, time-of-flight camera measures the distance between the camera sensor and the object for each point of the image by calculating the time difference between emission and return of an artificial light signal, after being reflected by the object. Favorable circumstances and drawbacks of various camera sensors are discussed in Table 12.2.

Table 12.2 Advantages and disadvantages of different camera sensors used for tree fruit detection and localization

Earliest studies dated back in the late 1980s initiated the application of black and white cameras for fruit detections aiming to ensure first step respecting to the development of automatic fruit harvesting system (Whittaker et al., 1987), however, successes were not sufficient to move forward because of the sensor’s limitations and inability to acquire useful color information/features. Color is most prominent features for tree fruit detection, especially for ripe fruit detection (e.g., red apple, yellow orange, dark yellow peach), which is not possible to extract from black and white camera specifying the need to use color cameras. Color cameras introduced in the early 1990s provides the first time opportunity to detect fruit based on color features along with geometric and texture information. Success of the color cameras is adequate when the ripe fruits color is different than leaves, branches, and background (e.g., red apples, yellow citrus, and yellow pear fruits in green background). The sensor performs poorly when the fruit color is same as the leaves or background considering only color information. Another problem noticed that the color camera is highly susceptible to the illumination variations and make the sensor unsuitable in the orchard condition. Spectral camera sensors came up in addressing the color similarity problem between fruit and background by providing spectral information along with special information about fruits, leaves, branches, or other objects (Kondo et al., 1996). Potential of spectral camera has been delineated for fruit detection using different wavelengths considering the appearance of different fruits. However, major limitation is reported for the longer data acquisition and processing time, especially using hyperspectral camera (Kim & Reid, 2004) that forged the spectral sensor difficult and challenging for real-time detection. Thermal cameras also utilized for fruit detection aiming to solve the color similarity problem between fruits, other objects and background, but performance of this types of sensors is greatly affected by fruit size and direct sunlight exposure. The accuracy of the thermal camera is lower in shaded and high canopy density area because there is not any significant temperature difference existing between fruits and other objects including leaves, branches, and background in those regions. Aside from the fruit recognition sensors, the stereovision and time-of-flight cameras are mainly used for fruit localization. Stereovision camera measures the position of the target objects from the camera sensor by performing the stereo matching of multiple images acquired using various cameras installed in various arms. However, performance of this vision system is affected by illumination variation, wind speed and direction, and efficient of the hardware component (Plebe & Grasso, 2001). Another major limitation is long computational time and complexity. Time-of-flight camera introduced for the fruit localization due to its faster data acquisition and processing speed. In the last few years, time-of-flight cameras showed promising potential for fruit localizing which is also suitable for outdoor orchard environment especially using an RGB-D (Red, Green, Blue-Depth) camera (Fu et al., 2020), which provides the RGB information along with depth and infrared information; however, direct sunlight exposure can affect the accuracy of the sensor.

12.4.1.2 Fruit Detection and Classification for Harvesting

The first step of camera vision system for fruit harvesting is the image acquisition stage where images are captured from the tree fruit orchards. After the image has been captured, different processing methods have included feature extraction and classification can be applied to the pre-processed image to detect fruits from the leaves, branches, and other objects background. Color is one of the most valuable features used in image processing based detection to differentiate fruits from other neighboring objects (i.e., fruits, foliage, or branches) presence in orchard environment. Distinguishing oranges from the natural background was the first attempt toward developing robotic harvester using color features and detected 75% of the fruit pixels successfully showed the potential of applying color features for fruit detection (Slaughter & Harrell, 1989). The accuracy of the fruit detection was improved in the later years up to 88.0% using only color features (Bulanon et al., 2002; Qiang et al., 2014), however, fruit detection accuracy based on color features is greatly affected by illumination variation, fruit variety, fruit maturity level, and uncontrollable orchard environment. Illumination variation during image capturing can provide different light intensities; therefore, it would be very difficult and challenging to detect fruits under uncontrollable lighting environment using color features. Geometric features mainly considering the size and shape of the fruits are being used to address the color feature problems especially when the green fruits need to be detected from green leafy background. These types of features are also less susceptible to illumination variations which make it suitable for real-time orchard condition unless the blurred image caused during data acquisition due to high wind velocity. Lu et al. (2018) detected green immature citrus fruit using geometric features and achieved 82.3% of precision rate. Performance of geometric features (i.e., searching circles) is boosted up to 85% of accuracy for detecting the green apples from green background when they were visible in the captured images, but the occluded apples caused the false-positive detection (i.e., considered leaves, stems, and branches as fruits) (Linker et al., 2012). Conversely, iterative Circular Hough Transform (CHT) and blob analysis based geometric features provided over 90% of accuracy for “Jazz” and “Fuji” apples detection in clearly visible and partially occluded apples on tall spindle architecture canopy trees (Silwal et al., 2014). However, the major problem using the geometric features is the occlusion of fruits, which results in the poor performance due to alter in size, shape, and other geometric characteristics of the tree fruits. Textures are another important feature which is not affected by the surface color so it can also be used to detect fruits from the similar color background (i.e., leaves and stems). Tree fruits generally have smoother surfaces compared to the leaves, branches, stems, and other objects. Detection of fruits using texture features isolates the surfaces with the homogeneous texture and afterward distinguishes the edges of the isolated surface (Zhao et al., 2005). Performance of these types of features for fruit detection is not so high when only the texture features are used. Considering a novel Eigen Fruit approach and blob analysis, a Gabor wavelet based texture analysis was utilized to detect green citrus and achieved an accuracy of 75.3% with 27.3% false detection (Kurtulmus et al., 2011). Variable illumination condition, complexity of the fruit background, and varying fruit size have tremendous effect on the texture properties of fruits reducing the accuracy of the detection (Zhao et al., 2005; Kurtulmus et al., 2011). Combining texture features and other features (i.e., color and geometric) can enhance the accuracy up to 89% while detecting “Golden Delicious” and “Jonagold” apples (Stajnko et al., 2009). Besides color, geometric, and texture features, a 3D shape of the fruit was reconstructed (Fig. 12.6) for improving the detection accuracy, but the methodology was only justified hypothetically, and therefore ample tests were required to show its reliability in real-time orchard applications (Rakun et al., 2011).

Fig. 12.6
figure 6

Original acquired image (upper left image), color segmented version (right top image), cleaned version by applying morphological operators (bottom left image), and finally, 3D shape analysis (bottom right image). (Adopted from Rakun et al., 2011)

To perform successful fruit detection from the other neighboring objects, the image classification is required after extracting valuable features from the images. Supervised classifiers have included Bayesian and K-nearest neighbor; unsupervised classifiers included K-means clustering; and soft computing methods included artificial neural network (ANN) and support vector machine (SVM) were used so far for fruit detection. Bayesian is one of the multivariate statistical classification techniques used widely for object detection/classification based on prior knowledge and probability distributions also called posterior probability theory. Bayesian discriminant was used to classify oranges considering the color information and classified 75% of fruits successfully (Slaughter & Harrell, 1989). Considering the similar method, Juste and Sevila (1992) applied a pattern classification method of Bayes’s rules for citrus fruit detection and reported up to 90% of accuracy. Although the higher detection accuracy showing the potential of Bayesian classifier for fruit detection, the major drawback is that the prior probabilities information require in detection that can be affected due to the changes of color value of fruits caused by the illumination variations (Chinchuluun et al., 2007). Contrarily, K-nearest neighbor (KNN) based supervised classifier, also susceptible to illumination variable is used to classify unknown feature vector to the class by measuring the closeness measure between the obscure and each training samples. To detect the green apples in captured RGB image, a KNN classifier was used in two dataset recorded in direct illumination and diffusive light conditions and reported 85% and 95% of accuracies for correct detection (Linker et al., 2012). Another significant impediment of KNN based algorithm is huge processing time to group an obscure feature vector which makes it inadmissible for real-time field applications (Mitchell, 1997). Besides supervised classifiers, K-means clustering based unsupervised machine learning classifier is also used for fruit detection, which allocates every data point into the nearest cluster dependent on their intrinsic distance between one another. However, the performance of K-means clustering in fruit detection is not so high using different images including color and thermal especially for green apples (Wachs et al., 2010). The soft computing methods including ANN and SVM are also supervised machine learning algorithms become so popular and widely accepted for fruit detection in the orchards (Wachs et al., 2010; Qiang et al., 2014). An SVM based classifier isolates the two classes with a greatest edge between them by a hyper-straight plane to classify objects. Tao and Zhou (2017) detected apples, branches, and leaves using a multi-class SVM classification method and achieved an accuracy of 94.64%, 47.05%, and 75%, respectively, while acquired images by a Kinect V2 camera sensor. Using the same camera, Lin et al. (2019) detected citrus fruits based on SVM algorithm and reported a F1-score of 91.97% using color, gradient, and geometry features. Qiang et al. (2014) used RBF kernel function for applying a multi-class SVM classifier to detect citrus fruits from the leaves and branches by using color features and reported a detection accuracy of 92.4%. The authors identified that illumination variations, fruit occlusion, and immature fruit were the major factors reducing the classifier as well as system performance. Apart from SVM based soft computing method, an ANN based machine learning algorithm detects the fruits by learning specific patterns/model defined by the training data through the iterative training process. To develop orange picking robot, a neural network (i.e., back propagation) based machine learning algorithm along with color features was used to detect oranges from the images captured at different lighting conditions and achieved an accuracy of 87% with 15% false positive and 5% false negative (Plebe & Grasso, 2001). Additionally, Kurtulmus et al. (2014) compared three classifiers including a statistical classifier (i.e., discriminant analysis), an ANN, and an SVM performance for immature peaches detection under various illumination conditions and reported the ANN classifier performed better (82%) than discriminant analysis (80%) and SVM (62%). Despite both supervised and unsupervised machine learning classifiers showed good performances, but most recently, significant advancement and effort have been accomplished through the application of deep learning algorithms on fruit detection due to its larger learning capabilities resulting in higher performance and precision, which is based on multiple layer ANNs (Koirala et al., 2019).

Deep learning is one of the machine learning techniques that can learn the features themself from raw data and provides a hierarchical representation of the data through deeper neural networks and various convolutions. Object detection using deep learning algorithms becomes more popular due to their higher detection rate and fast detection speed in the past years which is applied in various fields of research (Gao et al., 2020). Deep learning networks including convolution neural network (CNN), region-based CNN (R-CNN), Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO) network are increasingly applied in recent years for orchard management and provide an excellent framework for fruit detection (Bargoti & Underwood, 2017; Fu et al., 2020; Gao et al., 2020). Considering rapid progress and improvement in deep learning algorithm, a Faster R-CNN model was tested and achieved an accuracy of 95% for Fuji apple detection (Gené-Mola et al., 2019). To reduce detection time and improve detection accuracy, the convolution and pooling layers of Faster R-CNN were modified by Wan and Goudos (2020), the developed model was tested for green apple and orange detection and achieved 92.51% and 90.73% of accuracies, respectively. Numerous deep learning algorithms (i.e., Yolov3, R-CNN, and VGG-16) application for apple fruit detection was reviewed and reported the detection accuracies ranged between 84% and 95% (Koirala et al., 2019). By combining Gaussian Mixture Models based semi-supervised method and deep learning method, Häni et al. (2020) developed a novel semantic segmentation-based approach for fruit detection and counting and reported the performance can be better than a single deep learning model with detection accuracies ranged from 95.56% to 97.83%. Compared to the conventional machine learning models, emerging deep learning algorithms are showing promising potential and benefit with the higher detection accuracy and the faster detection speed that are necessary for robotic fruit harvesting in real-time orchard condition.

12.4.1.3 Fruit Localization for Harvesting

Next step of detection is fruit localization, another very essential part of computer vision system for guiding robotic end-effectors to grab and detach fruit from the tree. Inaccurate fruit localization information causes failure of the end-effectors in successful fruit harvesting. Despite there are different types of challenges exist due to uncontrollable orchard condition (i.e., wind velocity, fruit occlusion, etc.), studies have conducted toward the accurate fruit localization (Bac et al., 2014). Fruit localization began with using a single black and white camera to identify fruit centroids aimed to extract 3D coordinate for grasping fruit from the branches by developing a mathematical transformation model (Parrish & Goksel, 1977). After about a decade, the color camera had been applied to identify fruit centroids by stick out the telescopic end-effector. This was made conceivable when the camera mounted at the center of the end-effector, at that point the fruit centroid in the captured image lined up with the pivot of the prismatic joint (Slaughter & Harrell, 1989). Apart from the fruit centroids, studies also conducted to localize fruit peduncle by using the color camera for the ease of fruit harvesting especially for detachment (Bulanon et al., 2001). For obtaining more precise fruit location, the laser systems were also utilized in some extend along with camera sensors where 2D location of fruit accessed via camera vision and a laser sensor used to measure the distance from the end-effector and fruit (Bulanon et al., 2004). Besides single camera applications for fruit localization, several attempts were reported using more than one camera by applying stereovision where fruits were located by triangulation. However, the main problem using a stereovision system was the correspondence problem where obtaining reference points in the practical view is difficult (Wang et al., 2013). Researchers attempted to solve the correspondence problem while using stereovision system, but they ended up with the error less than 20 mm due to densely distributed tree canopies (Si et al., 2015). Aside from stereovision, the red-green-blue-depth (RGB-D) cameras by Kinect V2 offer a new approach to extract 3D space for detecting and localizing fruits simultaneously (Fu et al., 2020). Studies reported that RGB-D camera along with advanced machine learning algorithm including Bayes classifier and Faster R-CNN can be appropriate for real-time orchard conditions with detection/recognition accuracies went from 92% to 95% and localization errors of 7.0 ± 2.5 mm, −4.0 ± 3.0 mm, and 13.0 ± 3.0 mm in x, y, and z axis direction, respectively (Zheng et al., 2018; Lin et al., 2019). On the other hand, several studies reported RealSense RGB-D camera performed better than Kinect V2 with an image resolution of 1280 × 720 pixel and sample frequency of 90 frames per second compared to 512 × 424 and 30 (Mejia-Trujillo et al., 2019). Considering the promises of RealSense RGB-D cameras shown in fruit detection and localization, we can assume that it could be an effective tool for real-time orchard applications in the future with high accurate manner.

12.4.2 Fruit Removal Dynamics and End-Effector Development

Fruit detachment is one of the major tasks in the robotic fruit harvesting. Prior to designing a fruit picking end-effector, it is necessary to investigate the dynamics for fruit detachment. The information provided by the dynamics includes picking or cutting force/torque, picking angle, and fruit detachment motion. Typically, robotic picking requires fruit detachment motions planned and performed with sufficient grasping forces applied to the target fruit (Tillett, 1993). For a human picker, an apple is detached by gently griping it with fingers and twisting it around the connection point of its stem and limb. At the same time, pickers put one finger on the connection point to minimize the movement of the connection point or the pivot. Reduced movement of the pivot point will increase the torque around this point and thus increases effectiveness of fruit detachment. Preliminary tests showed that twisting of apples by attaching the pivot could achieve more effective and efficient detachment than pulling them (Bulanon & Kataoka, 2010).

To provide baseline information for developing a conceptual robotic end-effector for apple picking, a series of fundamental physics studies for apple picking were conducted by a Washington State University (WSU) research group with mimic human picking operations (Fig. 12.7, He et al., unpublished document). These physics included the picking orientation, the applied force/torque, and the relations to the apple weight and stem length. Three flexible force sensors were mounted on three fingers of picker to measure the force applied to the apple surface during picking operation. A hand-held picking device, consisting of a gripper and a torque sensor, was built to measure the twisting torque for removing apples from the tree. Tests showed that picking apple along the peduncle direction obtained much higher picking efficiency. Force applied to the surface of apple varied from different pickers and different fingers, the force applied was from 0.43 ± 0.27 to 1.16 ± 0.33 kgf in this study, also the applied force showed positive relation to the fruit weight. The results also indicated that the detachment torque increased as the increasing of apple weight, and the picking angle increased as the increasing of apple stem length. Furthermore, Davidson et al. (2016) investigated the hand picking dynamics for robotic apple harvester design. The results indicated that each variety has different detachment force. And the study also suggested to use a tactile sensor in a robotic end-effector to potentially determine the point of fruit separation and minimize the path traveled by the end-effector during harvesting. Li et al. (2016) indicated that bending motion could improve the fruit detachment performance for apple picking. To remove a fruit from the branch, bend-and-pull picking will require less energy than straight pulling along stem growth direction. Flood (2006) designed a robotic citrus harvesting end-effector and a force control model using physical properties and harvesting motion tests.

Fig. 12.7
figure 7

Hand picking apple force measurement setup and method. (a) force sensor; (b) sensor equiped picking glove; (c) picking apple by twisting; (d) picking apple by pulling

End-effector is a critical component for a harvesting robot, which is used to detach fruits from the tree with appropriate force and motion. Designing an end-effector tool for fruit harvest can be a challenging task due to the complex canopy environment and unique fruit characteristics. The design should consider the mechanical and spatial requirements including size, shape, weight, and maneuverability, and the task object requirements including physical, horticultural, and biological properties (Kondo & Ting, 1998). Researchers in the past have put a lot of efforts on developing end-effectors to harvest different kind of crops including orange, tomato, eggplant, cucumber, and apples (Muscato et al., 2005; Whittaker et al., 1987; Van Henten et al., 2003; Hayashi et al., 2002; Davidson & Mo, 2015). Different detachment motions also have been tested in these studies, such as pulling, twisting, cutting, and combination of two. Zhang et al. (2020) did an extensive review for different robotic grippers used for agricultural applications along with their grasping and control strategies. Many picking end-effectors use either two or more fingers to grasp the fruit to detach it (Burks et al., 2005). Some of these end-effectors used air to suck the object and grip it, then use scissors to cut the peduncle to detach the object, which may cause damage to the fruit peduncle. Conversely, suction devices comprised a vacuum cup to hold the fruit and combined with appropriate mechanism to detach fruit form the tree such as cutting the peduncle with blade mounted on the fingers (Hayashi et al., 2014), or a twist motion (Yaguchi et al., 2016). Bac et al. (2017) developed a four-fingered hand with a pair of scissors mounted on top to cut the stem. The designed hand may be more suitable for fruit with longer stems, however detecting and locating the stem is a challenging task in the complex canopy environment.

12.4.3 Harvesting Robot Manipulation

12.4.3.1 Robotic Manipulators

The tree canopy–machine interaction could be interpreted as manipulation of a machine (robotic arm/manipulator) within tree canopy to reach the identified fruit locations to perform the harvesting using an end-effector. The robotic arm or manipulator is the mechanical system like a human arm, usually comprised of links connected in a series joints that perform the intended tasks in the one-two-three-dimensional space. Each joint in the manipulator has one DoF and the kinematic dexterity is directly related to the number and type of joints in the manipulator (Burks et al., 2018). The currently available industrial manipulators are designed to perform repetitive tasks with uniform objects in unconstrained workspace. Conversely, the adoption of robotic manipulators for fruit harvesting has many challenges as agriculture is a constrained dynamic environment where the target objects vary in shape, size, position, and orientation (Simonton, 1991). The successful adoption of robotic manipulators requires consideration of its working environment (Kondo & Ting, 1998; Simonton, 1991). Thus, the robotic manipulators for tree fruit harvesting should be designed considering different factors such as canopy structures, and branch density, etc., for safe operation in the unstructured agriculture environment.

In an agricultural robot, the first joint of the manipulator is connected to the base of a mobile platform, and the last joint of the manipulator is an integrated end-effector unit, which consists of a tool/gripper to perform the required task is attached. The manipulator mainly works for the positioning of the end-effector close to the target fruit and then move the harvested fruit to the collection bin/container. For tree fruit harvesting, the manipulator could be designed with various configurations, based on total DoFs, and different combination of joint types. The selection of joints configuration is critical as it affects the kinematic dexterity and spatial requirements during manipulation of the robot to attain different positions and orientations of the end-effector. Based on total DoF selection, a manipulator can be designed with different number of joints starting from three or higher. However, increasing the number of joints (DoF) exponentially increase the computation and control complexity (Choset et al., 2005). A three DoF (3 DoF) manipulator is the most common choice due to its simple design and control architecture. For a known Cartesian position of the fruit, the 3 DoF manipulator (Harrell et al., 1990) could easily reach the desired position using the inverse kinematics. However, the end-effector (gripper) could not alter the orientation due to lack of DoFs. As the fruits on a tree grow at random orientations, the manipulator should have the ability to grasp the fruit from different orientations. The manipulator performance will be decreased if the fruits are occluded behind leaves or branches and the gripper may not be able to harvest the fruit. Adding additional DoFs to the manipulator such as a four DoF (Tanigaki et al., 2008) or five DoF (Zahid et al., 2020a) could be a solution to the problem to some extent by giving the capabilities to adjust the orientation of the end-effector, but harvesting the fruits present behind the obstacles deep inside the canopy could still be problematic. To completely describe the six components of the Cartesian space including three positional (x, y, and z) components, and three angular (yaw, pitch, and roll) components, the manipulator should have six joints in its assembly. Thus, the agricultural manipulator should have at least six DoFs (Onishi et al., 2019) to attain all possible orientation and position in the workspace. However, with higher DoFs, the kinematics of the manipulator results in two poses (elbow up and elbow down) for any desired position and orientation, which can lead to a higher chance of manipulator collision with the branches at some poses, causing damage to manipulator, fruit, or tree. Another problem with six DoFs is its limitation of a single pose in the workspace, and it may not be able to avoid all the obstacles, which is essential for the safe operation of robot (Burks et al., 2018). Considering the unstructured canopy environment, the manipulator with at least one excess DoF such as a seven DoF (Mehta et al., 2014; Silwal et al., 2016) for the positioning and orientation is a better solution to avoid collisions, also known as redundant manipulators. These redundant manipulators can attain multiple orientations for any target position to avoid collisions by changing the pose to the optimal. Although the redundant manipulators improve the kinematic dexterity to grasp the fruit by attaining different orientation, it also increases the complexity for manipulation controls (Fig. 12.8).

Fig. 12.8
figure 8

Illustration of manipulator with different joint configurations and wrist end-effector; (a) Cartesian (PPP), (b) Cylindrical (PPR), (c) Spherical (RRP), and (d) Articulated (RRR)

The performance of the tree fruit robotic manipulator could also be affected by the type of joints such as prismatic, rotational, or combination of both joints, used for its assembly. Figure 12.8 shows few examples of different configurations of first three joints for a six DoF manipulator integrated with spherical wrist gripper end-effector. The first three joints, referred as Cartesian positioning (x, y, and z) links, move the end-effector in the proximity of target fruit. The last three joints, referred as wrist, alter the orientation of the end-effector for accurate positioning at the target. Each of the shown manipulator has a different workspace and spatial requirements for manipulation. During maneuvering, each joint contributes to alter the manipulator pose and end-effector position and orientation in the canopy. When the manipulator starts maneuvering inside the canopy, the major change in position and pose of the manipulator link is due to the positioning joints and a small contribution is from the wrist joints. With greater degree of pose change, the chances for collision with branches increase within the canopy; therefore, the joints for positioning should be selected which allow the minimum change in pose of the manipulator during maneuvering. Zahid et al. (2020b) developed apple tree pruning manipulator by integrating three prismatic joints (3P DoF) with three revolute (3R DoF) joints. The integrated tree pruning manipulator showed promising results as it was able to reach all selected branches with lower pose change, which reduced the collision potential. In general, the Cartesian/prismatic joints have low pose change attributes, as the orientation of the links remains the same irrespective of the joint movement. Thus, a manipulator could be developed considering different joint types to reduce the spatial requirements. For example, the positioning joints as shown in Fig. 12.8a may perform positioning motion outside the canopy with a slight pose change and could have less spatial requirements for the maneuvering of the spherical wrist end-effector within the tree canopy for reaching target fruits. Similarly, when aiming to reach the fruits inside the canopy, the maneuvering within the tree canopy for reaching target fruits using different joint combinations as shown in the figure could affect the manipulator pose change differently. Thus, the manipulator design should consider the requirements for different tree features such as canopy sizes and structures to ensure that the end-effector reaches all positions in the canopy with minimum spatial requirements and least chances for collision with the branches.

12.4.3.2 Robotic System Control

An agriculture robot must solve multitude of problems to perform the operation such as fruit harvesting, thinning, and pruning, etc. Unlike industrial robots, where a repetitive work is performed for same objects, the target fruits are located at different position and orientation. Thus, during agricultural operation, there is no repetition of the same motion/path, and the robot needs the information about every target to perform the target-specific motion. The manipulator movement and control could be established using the information from the sensing or vision system, also referred as vision-based manipulation control. The vision-based control for the harvesting robot is essential as the manipulator could use the visual information for path planning and motion. The inefficiency of vision-based control is one of the primary factors limiting the performance of the harvesting robot. The vision-based control is categorized into two types: visual navigation or visual servo control and eye-hand coordination or global camera system (Zhao et al., 2016). The global camera system is an open-loop control system which is operated based on “3D positioning.” The camera system scans the complete scene to detect all fruits and then start moving to the target fruits. The control efficiency in terms of the end-effector positioning depends on the accuracy of the vision system, calibration of manipulator and camera system (Yau & Wang, 1996). To achieve higher efficiency, the vision system may be consisting of stereovision or range sensors to precisely measure the distance to the target fruits. However, for open-loop visual control, an accurate kinematic model of the manipulator is essential for the path planning to reach target fruits. Han et al. (2012) successfully established the open-loop visual control for path planning using a color stereoscope camera and a laser sensor. The execution time for successful harvest was less than 7 s per fruit. However, one downside of the open-loop visual control is low efficiency in the situations where the fruit is under the influence of wind or movement from other reasons.

The second category of visual based control is the visual based feedback control loop, also referred as visual servo (Corke & Hager, 1998). The visual servo is a closed-loop control system which is operated based on “concurrent looking and moving” as a dynamic system. The visual servo used the image features extracted from the camera-in-hand system to control the position and orientation of the end-effector on the fly (Hashimoto, 2003). A major advantage of visual feedback control is that the performance does not rely on the accuracy of the kinematic model and the calibration of vision and manipulator system. However, one important consideration to achieve high efficiency of visual servo control is that the bandwidth of the vision controllers should match the frame rate of the visual information coming from the camera system. Zhao et al. (2011) successfully implemented the visual servo controls in an apple harvesting robot. Font et al. (2014) combined open-loop and visual servo controls. Using the open-loop control, the end-effector moves quickly in the proximity of the target fruit, followed by adjusting position and orientation through guidance from visual servo to harvest the fruit. The general comparison of these two types of control is given in Table 12.3.

Table 12.3 The comparison of two types of vision based controls

12.4.3.3 Collision-Free Path Planning

The path planning of a harvesting robot is one of the most important components for successful operation. The path planning strategies including picking order and obstacle avoidance, etc. are essential to achieve higher harvesting efficiency as well as safety of the robot during interaction with the canopy. With the advancement in the computing theory, the path planning and controls are becoming more reasonable and efficient (Jia et al., 2020). Different path planning and harvesting order strategies are discussed by various researchers. The path of the robot can be established using the kinematic model of the manipulator, which calculates the displacement toward the target fruit position. The manipulator uses the inverse kinematics equations to establish the path using open-loop control (Yau & Wang, 1996) or visual servo control (Hashimoto, 2003). The kinematic model considers the body dimensions of the robotic manipulator and the target position, so the collisions could be possible during the operation, which could result in the damage of robot or the tree. A separate set of algorithms are required to avoid the collisions during operation. The task or harvesting order planning strategies are also studied by many researchers. Most common method is to detect and localize the target fruit and the path for each harvesting cycle starts from the home position of manipulator (Roldan et al., 2018). Researchers have also developed harvest sequencing schemes to optimize the harvest cycle time. The Traveling Salesman Problem (TSP) is widely reported scheme used for harvest sequencing. Yuan et al. (2009) implemented an algorithm to covert the apple harvesting task into a three-dimensional traveling salesman problem (TSP) to get the finite field information and then used ant colony algorithm to optimize the path planning. Some other task planning schemes were also developed by researchers such as harvesting all detected fruits in the scene (Baeten et al., 2008) and optimal harvesting sequence by moving fruit-to-fruit for reducing the cycle time (Reed et al., 2001). Plebe and Anile (2002) obtained an efficient harvesting sequence plan by converting the harvesting task into twin traveling salesman problem (TTSP). All these path planning and task planning strategies could be feasible for reaching target following the optimized path. However, the manipulator collision with branches could still be a problem and needs to be addressed as it is essential for the safe operation of the fruit harvesting robot.

The tree fruit canopies usually have complex structure with branches growing in the random direction and orientations, which limits the manipulation capabilities of the robotic manipulators. To ensure the safe and successful robotic operation, it is essential to establish the collision-free paths for the robot movement. The collision-free path refers to the movement of manipulator and end-effector toward the target fruit without hitting the branches. In the recent years, the challenges of obstacle detection and collision avoidance for tree fruit harvesting robot have gained interest from the researchers. The obstacle detection is the task performed by the machine vision system such as camera and proximity sensors, etc. The collision detection sensors can be integrated with the end-effector such as a position sensor in an apple harvesting robot (Zhao et al., 2011), Light Imaging Detection and Ranging (LIDAR) sensor in a cherry harvesting robot (Tanigaki et al., 2008), and a camera for litchi harvesting robot (Cao et al., 2019). However, the obstacle avoidance task presents more challenges. For collision-free path planning, many researchers have proposed algorithms including grid-based, neural networks, and random sampling. Grid-based algorithms such as A*, or Phi* or ant colony, etc. are suitable for multi-objective problems but computationally expensive for complex environment and could give satisfactory results with up to two or three DoF manipulators (Nash et al., 2009). With the increase in the DoFs of the manipulator, the computational complexity and planning time increase exponentially (Choset et al., 2005). As mentioned earlier, the tree fruit harvesting robot should have at least six or seven DoFs, giving manipulator the flexibility in the poses to avoid the obstacles. Most of these grid-based path planning algorithms may not be suitable for agricultural application. Conversely, the sampling-based planning approaches such as rapidly exploring random tree (RRT), RRT*, or bi-directional RRT are probabilistic-complete algorithms, i.e., if solution exists, they find path, and perform better for high dimensional complex problems and are less influenced by the DoFs of the manipulator.

Nowadays, the RRT based search algorithms are widely adopted for collision-free path planning in the agricultural environment. Nguyen et al. (2013) proposed a framework for motion and hierarchical task planning of a nine DoF manipulator for harvesting apples. The strategy was first implemented in simulation and then real-time communication between sensing and execution was successfully established in the orchard environment. The authors used seven different sampling-based planning algorithms including RRT, and RRT connect, and concluded that the RRT connect as most efficient for path planning in terms of processing time. However, the nine DoF manipulator has enough flexibility in the pose to avoid collision with branches. Cao et al. (2019) successfully also used RRT for collision-free path planning for six DoF litchi harvesting robot. The path calculated using the sampling-based algorithms is not the optimal solutions, as it has less convergence speed and more processing time. These deficiencies could be minimized by combining optimization algorithms such as genetic algorithm (GA) to reduce the path cost (LaValle, 2006). Also, the random sampling-based algorithm have longer path length due to intrinsic search properties (generating and connecting random nodes in the search space). The path smoothing method which aims to omit the unnecessary nodes (Zahid et al., 2020c) can be used to reduce the length of collision-free path. For random sampling-based search algorithms, the path planning time depends on the sampling resolution, which should be optimized, considering the required path success rate.

12.5 Conclusions and Future Directions

Robotic harvesting systems have been investigated in the past decades, the enhancements in both technologies and horticulture have bring much more promising for the adoption of these systems for agricultural applications. For tree fruit crops, tree structures in modern orchard are getting much simpler with high-density canopy systems. These tree systems are much more robot-friendly for implementing robotic harvesting system by comparing to the conventional tree systems. While even with these trees, the harvesting task is still relative complex due to the natural of biological system. A successful robotic harvesting system would be considered as accurate, robust, fast, or even inexpensive system. Therefore, the critical points for success of robotic harvesting for fruit trees are the accuracy of fruit detection, the spatial requirement of picking end-effector, and the efficiency of picking operation (time for fruit identification and the time for maneuvering the end-effector).

The current research on tree fruit harvesting robots mainly focused on developing vision systems for accurate detection and localization of the target fruits. However, the improvements in many other components including the manipulation controls, optimized harvest sequencing, and obstacle avoidance are also required. The robot path planning is critical for accurately reaching the target points. The path planning mainly involves three operations: manipulation controls, task sequencing, and collision avoidance. With the recent advancement in computing technologies and control algorithms, there are many opportunities of developing efficient controls for tree fruit harvesting robots. As discussed earlier, the vision-based manipulation control (open- and close-loop) is critical for the robotic harvest operation as the target location is unknown, the type of control scheme should be selected carefully to achieve the desired outcome. The open-loop manipulation controls could be a good control scheme, but some natural factors in the field such as wind could alter the position of the targeted fruit, making it a dynamic environment to reduce the robot performance. Conversely, the visual servo control is computationally expensive and requires highly accurate vision system for successful operation. As both types of vision-based control have advantages and limitations, a robotic harvester integrated with a combination of both open- and close-loop (global and local) manipulation controls could improve the harvesting efficiency. By doing so, the global path planning can provide the initial guideline to start the robot motion, and once the end-effector reaches the proximity of the target fruit, the manipulation can be changed to local control for accurate positioning. Using the global control scheme, the harvesting robot can have the information about the complete scene (all fruits) before the start of operation and the path can be calculated for multiple fruits simultaneously, to reduce the cycle time. Conversely, the local control can guide the end-effector to attain desired orientation to grasp the fruit, employing parallel computation.

In addition, the harvest sequencing is also essential to optimize the path length and cycle time for each fruit. The TSP-based sequencing algorithms can be a potential solution to optimize the path lengths and cycle time. Studies have been reported using TSP and other TSP variants to optimize the path length and cycle time (Yuan et al., 2009). A redundant manipulator can perform well as it has infinite pose configurations for reaching a target point. However, for redundant manipulators, optimizing the task sequencing with TSP may not be enough, require optimization of pose configuration as well, which could be solved using TSP-N-based optimal path planning (Vicencio et al., 2014). Additionally, the collision avoidance is one of the biggest challenges for tree harvesting robots. Researchers have implemented collision-free path algorithms for different robotic operations in tree fruits. The random sampling-based search algorithms such as RRT, RRT*, and bi-RRT are widely adopted to collision-free path planning because of their higher success rate. The path solutions from the random sampling algorithms are not always optimal. The recent advancement of intelligence-based optimized search algorithms such as ant colony optimization (ACO), particle swarm optimization (PSO), and genetic algorithm (GA) can provide the optimal collision-free path solutions in the constraint tree canopy environment. Adding a numerous approach poses of the manipulator to reach the target could also improve collision avoidance (Zahid et al., 2020c). Also, a redundant manipulator could perform well for collision avoidance due to its higher pose flexibility; however, additional DoFs will increase the path finding time and cost of the manipulator (Bac et al., 2017). Overall, the computation complexity of the robotic harvest operation will be increased with the addition of obstacle avoidance in the path planning scheme. Thus, fast and efficient collision-free path algorithms are required to ensure successful and safe operation. An efficient fusion of path planning algorithms, including task sequencing and obstacle avoidance is essential for successful robotic tree fruit operations.

Being an emerging technology, machine vision combined with machine learning algorithms has become a crucial factor in the development of automatic harvesting robots. The complexity of harvesting robots has been minimized to a great extent due to extensive progress in machine vision technologies, including advanced camera sensors and artificial intelligence (AI) algorithms. Time-of-flight cameras (e.g., RGB-D) have been used in recent years showing promise for fruit recognition. The potential of using all types of time-of-flight cameras is not identical. Studies reported good accuracy with RealSense RGB-D cameras, but these types of cameras have high sensitivity to outdoor illumination and could provide low-resolution images. The high-resolution cameras including but not limited to Microsoft Kinect and Zed stereo cameras might be better options instead. Overcoming image acquisition problems caused by different environmental conditions should be the key. Although scholars have carried out many studies using traditional machine learning (ML) for fruit detections, the current innovations of deep learning algorithms, including Faster-RCNN, Mask-RCNN, ResNet, and DenseNet outperformed traditional ML algorithms have been proven in different agricultural researches. The deep learning algorithms assembled with graphics processing units (GPUs) have been widely applied to increase the computing power while processing high-density data. The machine vision technology has been rigorously used in complicated and unknown plant environmental conditions for its robustness and high complexity. But at the same time, most existing machine vision-based systems are only implemented in laboratory, semi-customized, and customized environments for experimentation, resulting in a huge inconsistency between the experimental and original field conditions for fruit recognition. Due to this limitation of machine vision technologies, the adaptability of the harvesting robots to complex and unstructured environments still remains a major bottleneck problem affecting the harvesting robot’s maturity and limiting the application in orchard conditions. Therefore, universal machine vision technologies need to be developed that could recognize fruits in any environmental condition.