2.1 Introduction

2.1.1 Importance of Modern Tree Canopy Management

The US tree fruit industry is an important component of the national agricultural sector, representing ~26% ($11 billion) of all specialty crop production (Perez & Plattner, 2015; USDA-NASS, 2015). The industry is highly labor-intensive and is becoming less sustainable due to rising labor costs and growing labor shortages (Calvin & Martin, 2010; Fennimore & Doohan, 2008). Properly managing the tree canopies with branch training and pruning is an essential task in developing machine-friendly tree architectures, which could greatly benefit the tree fruit industry by adopting new and innovative robotic technologies.

New tree fruit orchards are increasingly planted in modern, high-density architectures that use dwarfing rootstocks and training systems designed for maximum sunlight interception, higher fruit yields and quality, and easier worker access (Milkovich, 2015; Warrington et al., 1996; Zhang et al., 2015). These new fruit tree training systems and rootstocks could potentially advance and improve the economic benefits of growing highly productive trees with excellent fruit quality (Baugher, 2017; Schupp et al., 2017). The key to maximizing profitability in an orchard operation, however, is the ability to integrate these architectures with mechanized/robotic systems that should perform multiple and diverse tasks. Previous research has indicated that trellis-trained fruiting wall orchards are greatly amenable to robotic/mechanized harvesting (He et al., 2017a, b; Silwal et al., 2016; Zhang et al., 2018a, b) and pruning (Zahid et al., 2020a, b). Some private companies, such as Abundant Robotics, Inc. (founded in 2015) and FFRobotics Ltd. (founded in 2014), are also seeking robotic solutions with these high-density modern planar tree orchards. The well-managed tree canopies would be a core for the successful implementation of mechanical and robotic operations in the orchards.

2.1.2 Conventional Tree Canopy Management

Typically, tree canopy management is done through training and pruning. Training begins at planting and may be required for several years to guide the trees to grow into a specific canopy shape or structure. Pruning is an action of removing branches to control the tree size, fruit quality, and yield, and appropriate pruning can also improve pest and disease control. The operation is generally carried out during winter when the branches are easily visible without leaves (dormant pruning), whereas it sometimes includes summer pruning called hedging. Traditionally, both tree training and pruning are done manually through skilled workers. Figure 2.1a shows tree branch pruning using a long lopper, and Fig. 2.1b shows tying a branch to trellis wire using an electrical tap to form a fruiting wall canopy.

Fig. 2.1
2 photographs. A, The branches of an apple tree without leaves on which 2 hands perform branch pruning with a long looper. B, 2 hands tying a branch of an apple tree on trellis wire with tape.

Manual pruning and training for apple tree canopy management. (a) Branch pruning using a long lopper and (b) canopy training by tying a branch onto trellis wire

Manual canopy management operations are labor-intensive and costly, and the decision varies from person to person based on the skills and experiences of the individual. For pruning, workers make the cutting decision by considering the branch diameter, number, distribution (density), and quality. The availability of farm labor is also becoming an issue for the tree fruit industry. To improve the working efficiency, orchard platforms are used to reduce the time for climbing ladders (Fig. 2.2a). Meanwhile, mechanical hedgers that remove the sides and tops of the canopies have been tested for fruit tree pruning (Fig. 2.2b). The degree of success for hedging is limited by factors such as unwanted vegetative growth, reduced fruit quality, and higher fruit density (Martí & González, 2010; Webster, 1998). Mechanical pruning works well for evergreen fruit-bearing trees like citrus but is found unsuitable for other fruit trees due to complex tree architecture, which requires selective pruning (Childers, 1983). Robotic selective pruning would be a potential solution for these trees.

Fig. 2.2
2 photographs. A, a photo of 2 men standing on the pruning assist platform system in between 2 rows of trees. B, a photo of the mechanical hedging system.

Alternative pruning solutions for tree fruit orchards. (a) Pruning assist platform system (Bandit, Automated Ag) and (b) mechanical hedging system (FAMA hedger)

2.1.3 Tree Fruit Production Mechanization with Modern Tree Canopies

An integrated robotic system for tree fruit production generally includes the robot-canopy interaction for creating a collision-free path for the robot to reach the target, a machine vision system to provide object detection, a manipulator to position the end-effector, and an end-effector to conduct the task (Fig. 2.3). The manipulation of a mechanical system (robotic arm/manipulator) inside the tree canopy to reach the target positions and perform desired tasks, such as a fruit or a branch, is referred to as robot-canopy interaction. For an agricultural robot, environment perception is gathered from a sensing system, followed by the manipulation and control of the mechanical system to reach the targets. However, the maneuvering of the manipulator in a constrained agriculture workspace poses great challenges. The crucial elements required for superior robot-canopy interactions include kinematic dexterity and spatial requirements of the manipulator, manipulation controls, path planning, and obstacle avoidance. The current research on agricultural manipulators mainly focused on developing fast and efficient machine vision systems for the recognition and localization of the targets. In addition, efforts are underway to improve manipulation controls and optimize path planning and obstacle avoidance. For efficient mechanical or robotic operation, it is important to precisely reconstruct the tree canopy environment and understand the interaction between the canopy and robot, thus developing a collision-free path.

Fig. 2.3
An illustration represents 4 elements of robotic system integration. It includes an end-effector, vision system, robot-crop interaction, and manipulator.

The illustration of an integrated robotic system for tree fruit production

2.2 Robot-Canopy Interaction

2.2.1 Kinematic Dexterity and Spatial Manipulation Requirements

A manipulator/robotic arm is a mechanical system comprising links connected, viz., joints, that perform tasks in one-two-three-dimensional workspaces. The manipulator positions the end-effector close to the target. The last joint of the manipulator is usually connected to an end-effector unit to perform the required task (Kondo et al., 1993). The manipulator is defined in terms of its degrees of freedom (DoFs), link length, link angle, and link offset. Each joint in the manipulator has one DoF, and the kinematic dexterity and spatial requirements are directly related to the number and type of DoFs used in the manipulator assembly (Bac et al., 2017; Burks et al., 2018). The industrial manipulators are well suited to perform repetitive tasks with uniform objects in a free workspace, but agriculture is a complex dynamic environment, and the objects involved vary in shape, size, position, and orientation (Simonton, 1991). Thus, the adoption of robotics for fruit tree canopies has many challenges, which require better assimilation between manipulator abilities and its workspace environment (Kondo & Ting, 1998; Simonton, 1991). For efficient manipulation in an agricultural environment, the manipulators should be designed considering their intended applications, followed by the optimization of the kinematic framework for the said applications (Kondo & Ting, 1998). However, the optimization of manipulator kinematics is challenging due to natural variability between tree architectures and the available workspace for maneuvering. The manipulator could be designed based on various configurations, such as the type of joints and required DoFs, which affect the kinematic dexterity and spatial requirements during manipulation (Bac et al., 2017; Zahid et al., 2020b). Considering the tree canopy environment, the selection of a suitable configuration is critical for efficient robot-canopy interaction.

In the past decade, researchers have developed several manipulators to carry out different operations on tree fruits, such as harvesting (Silwal et al., 2017; Sivaraman, 2006; Zhang & Schueller, 2015) and pruning (Botterill et al., 2017; Zahid et al., 2020b). Considering the total DoF, three-DoF manipulators were the most common choice due to their simple design and control (Harrell et al., 1990). These manipulators could reach the target locations inside the canopy using inverse kinematics, but the orientation of the end-effector tool could not be altered due to low DoFs. The reduced manipulation could result in poor operational performance during robot-canopy interaction, especially when the targets are occluded behind leaves or branches, reducing the manipulator’s efficiency. Adding more DoFs, i.e., using a four-DoF manipulator for cherry harvesting (Tanigaki et al., 2008) or a five-DoF manipulator for apple tree pruning (Zahid et al., 2020a), could enhance the manipulator’s capabilities to adjust the orientation of the end-effector to some extent. However, the possible orientations of the end-effector tool at any target point in the manipulator workspace are still limited. Considering the constrained workspace inside tree canopies, these low DoF manipulators may not be suitable for harvesting or pruning due to the presence of obstacle branches.

A manipulator with six DoFs (Botterill et al., 2017) could reach positions in Cartesian space (x, y, and z) at any desired angular (yaw, pitch, and roll) components (Corke, 2017). However, for such manipulators, the inverse kinematics result in two poses (elbow up and elbow down) for any desired target position and orientation. This increases the control complexity during collision avoidance, possibly damaging the manipulator, fruit, and/or branches (Burks et al., 2018). For up to six-DoF manipulators, another challenge is their limitation to attain a single pose at any point in the workspace, which could fail to avoid the obstacles (Burks et al., 2018). However, for efficient robot-canopy interaction, ideally, the robot should be able to avoid all obstacles during maneuvering to reach the target fruits and branches. The manipulator with at least one excess DoF, such as seven DoFs (Mehta et al., 2014; Silwal et al., 2016) referred to as redundant manipulators, could be a solution for collision avoidance. These redundant manipulators have an infinite number of poses for any target position in the workspace and could possibly avoid the obstacles by changing the pose to the optimal, presenting a solution for developing manipulators for fruit trees (Burks et al., 2018). However, the additional DoF enhances the kinematic dexterity and manipulability, which are essential to avoid obstacles. But it exponentially increases the manipulation controls’ complexity (Choset et al., 2005).

Manipulators for fruit trees could also be categorized based on their types of joints. The performance of the manipulator is influenced by the selection of joint types such as prismatic, revolute, or their combinations. These combinations affect the manipulator’s workspace, dexterity, and spatial capabilities during manipulation (Bac et al., 2017). The manipulator should have fewer spatial requirements during manipulation to ensure efficient robotic operation in the complex canopy environment. During maneuvering inside the canopy, each joint contributes to altering the manipulator pose and orientation. The parts of the manipulator that contribute more to its pose change are referred to as the positioning links, and the part that adjusts the orientation of the end-effector tool is referred to as the wrist. With a greater degree of change of pose, the chances for collision with branches increase; thus, the joints for the manipulator positioning links should be selected in ways that result in minimum pose change. Zahid et al. (2020b) developed a manipulator by combining the revolute and prismatic joints for tree pruning. The revolute joints were added directly to the end-effector to reduce the spatial requirements during maneuvering, and the prismatic joints were used for positioning the end-effector to avoid the obstacles. As the low pose change attributes are associated with the prismatic joints, it could be a potential solution for collision avoidance without the need for redundant manipulators. Figure 2.4 shows a few examples of different configurations for a six-DoF manipulator integrated with spherical wrist shear pruner end-effector. The first three joints could be used for the Cartesian positioning (x-, y-, and z-axis) and the last three joints for adjusting the orientation of the end-effector. Each of the shown manipulators has a different workspace (mentioned in the figure caption) and spatial requirements during manipulation inside the canopy. For example, the positioning joints of the Cartesian system, as shown in Fig. 2.4a, may work outside the canopy with a slight pose change and could have decreased spatial capability for maneuvering the end-effector to reach the targets within the tree canopy. Similarly, other joint combinations (Fig. 2.4b–d) could affect the manipulator pose change differently during maneuvering within the tree canopy to reach a target. In addition, the manipulator design should consider the tree features, such as canopy sizes and structures, to reduce the collision potentials with branches.

Fig. 2.4
A set of 4 illustrations of cartesian, P P P, cylindrical, P P R, articulated, R R R, and spherical, R R P with prismatic P joints and revolute R joints.

Illustration of a manipulator having different joint configurations integrated with spherical wrist (RRR) end-effector: (a) Cartesian (PPP), (b) cylindrical (PPR), (c) articulated (RRR), and (d) spherical (RRP)

2.2.2 Manipulation Controls

The information about the surrounding environment gathered by a sensing system is provided to the manipulator for efficient manipulation control, also referred to as vision-based controls. The visual-based manipulation provides essential information, such as the position and orientation of the target objects and the obstacles. This information is particularly important for the fruit tree operations with variable position and orientation of the targets, such as fruits and branches. The manipulator could use the visual information for manipulation control to accurately reach the target as well as avoid obstacles (Zhao et al., 2016). Any inefficiencies of vision-based control could reduce the performance of the robotic manipulator; thus, they should be given serious attention. The advancement of sensing technologies and control algorithms is leading the way to establishing improved controls for agricultural manipulators.

The vision-based controls are grouped into two classes: global viewing or eye-hand coordination system and visual navigation or visual servo control system (Zhao et al., 2016). In the past, researchers have reportedly used both types of vision-based manipulation controls for agricultural operations. The global viewing system, also referred to as open-loop control, is operated based on a “fixed-point looking followed by moving” scheme. The sensing system scans the entire scene to gather information about the surrounding environment and then starts moving to the target. For open-loop controls, the positioning accuracy depends on the correctness of the information gathered from the sensors, such as cameras, as well as the accuracy of the kinematic model of the manipulator (Yau & Wang, 1996). Botterill et al. (2017) used an open-loop control scheme to establish the manipulation control for pruning grapevine. Silwal et al. (2017) used an RGB-D camera to establish an open-loop visual control for manipulation to harvest apple trees. The studies reported the accumulation of position and calibration errors due to the inefficiency of the vision system. To achieve higher position accuracy, the open-loop system could be integrated with other sensory information, such as range, proximity, and position sensors, to precisely measure the distance to the target (Zhao et al., 2011; Ringdahl et al., 2019). Han et al. (2012) successfully established the open-loop visual control for manipulation using an RGB stereoscope camera and a laser sensor to measure the distance from the target, with the positing error of less than 1 mm. As there is no position feedback in an open-loop control system, the manipulation efficiency is usually expected to be lower in a dynamic agricultural environment where the targets are under the influence of wind or movement from other reasons, which could change the fruit or branch position.

The second class of vision-based control is the closed-loop or feedback-based control, also referred to as visual servo control (Corke & Hager, 1998). The visual servo control operates the scheme of “simultaneous looking and moving” or “on the fly sensing,” making it a completely dynamic system. A sensor-in-hand system provides the on-the-fly information about the position and orientation of the target and the end-effector, which is then used for manipulation control (Hashimoto, 2003). A major advantage of closed-loop control is that the manipulation performance is unaffected by the accuracy of the kinematic model and the calibration of the vision-manipulator system. Harrell et al. (1990) and Mehta and Burks (2014) implemented a visual servo control using a fixed camera for a citrus-harvesting robot, with a position accuracy of 15 mm. However, as the manipulation is solely controlled using the on-the-fly sensor information, the performance depends on the accuracy of the vision system. Zhao et al. (2011) successfully implemented the visual servo controls by using a charge-coupled device (CCD) camera in an eye-in-hand mode for an apple-harvesting robot. You et al. (2020) used an eye-in-hand RGB-D camera configuration to execute visual servo manipulation control for pruning sweet cherry. These studies reported that the depth estimates from the vision system were not always accurate, resulting in lower position accuracy in reaching the targets. In general, visual servo control performs better than open-loop control for different applications; however, it still requires higher target localization accuracy for better manipulation control. Furthermore, as repetitive images are required throughout the operation, the closed-loop control system usually has a higher processing time (Silwal et al., 2017). One key consideration to achieve the desired performance is to match the bandwidth of the controllers with the frame rate of the visual information from the camera sensing system.

A comparison of both types of control is presented in Table 2.1. Both types of visual controls have some advantages and drawbacks; however, the selection for the manipulation control depends on the intended work and the test environment. Additionally, as agriculture is a dynamic and unstructured environment, natural factors, such as wind, should be considered for the selection of a manipulation control scheme. Considering the limitations of both schemes, a combination of open-closed loop could be a possible solution for manipulation control in fruit trees. Font et al. (2014) combined open-loop and visual servo controls in their study. With the open-loop control, the end-effector moved quickly in the proximity of the target, followed by adjusting the position and orientation of the end-effector at the target using on-the-fly guidance from the visual servo.

Table 2.1 A comparison of two types of visual manipulation controls

2.2.3 Path Planning and Task Sequencing

The prioritization or sequencing of tasks, such as harvesting fruits or pruning branches following an optimal order, is an important element of robot-canopy interaction. The optimal order could be developed based on various parameters, including minimum rotation of the manipulator’s joints, least collision in the workspace, shorter path length, and/or minimal time to reach the target. These optimal sequencing of the robot tasks, also referred to as path or motion sequence planning, are essential to achieve higher performance as well as to ensure the safe operation of the robot during interaction with the canopy (Raja & Pugazhenthi, 2012). In agriculture, the concept of path planning is crucial for successful operation and should be understood based on the types of obstacle environments. Path planning can be categorized into two groups: offline and online path planning (Zhao et al., 2016). Offline path planning requires complete information about the environment before initializing pathfinding, also referred to as global path strategy (commonly referred to as global camera system). For a constrained workspace, where collision avoidance and task sequencing are essential, this approach could be implemented for the static environment (stationary obstacles). On the other hand, online path planning, referred to as local planning, gathers information about the scene as it moves through the environment. In this strategy, pathfinding starts as offline and then switches to online mode during manipulation using the closed-loop feedback system (Zhao et al., 2016). This strategy is useful in the case of dynamic obstacles likely to occur in the agricultural environment.

The most common path-establishing strategy is to reach the target without using any search algorithm (Jia et al., 2020). The kinematic model of the manipulator is used to calculate the displacement toward the target, and the path is established using inverse kinematics based on open-loop control (Yau & Wang, 1996) or visual servo control (Hashimoto, 2003). However, these path strategies did not consider the task sequencing and obstacles in the workspace. Therefore, obstacle avoidance is unlikely. In recent years, with advancements in computing theory, path planning along with task sequencing is becoming more efficient. Researchers have reported numerous task sequencing strategies for different tree fruits. The most common method is to detect and localize the target, followed by the pathfinding and execution for the individual harvest cycle starting from the manipulator’s home position (Roldan et al., 2018). Zahid et al. (2020c) implemented a similar individual cycle-based approach for pruning apple tree branches. This single cycle path strategy reduces the performance as the path execution time increases. On the other hand, task planning was also reported by many researchers with harvesting all fruits detected in the scene. Baeten et al. (2008) and Reed et al. (2001) used the all-in-one-cycle-based task planning strategy to reduce cycle time by moving target to target.

In addition to task planning, researchers have also reported different optimization strategies for task sequencing and optimization in tree fruits. For the case of tree fruits, this could be referred to as sequencing pruning cuts, fruit harvesting, or fruit thinning to optimize path length or cycle time. The path minimizing strategy, based on Traveling Salesman Problem (TSP), is widely adopted for optimizing task sequencing (Applegate et al., 2011). Yuan et al. (2009) also implemented a TSP solver by converting the apple harvesting task into a three-dimensional problem to optimize the harvesting sequence. You et al. (2020) implemented a TSP solver for cut point sequencing in pruning sweet cherry and executed the optimal sequence with a high success rate of 92%, with a cycle time of 13 s per branch. Additionally, researchers have presented different amendments to the TSP solver, including Twin-TSP (T-TSP), TSP with Neighborhoods (TSP-N), TSP with Neighborhoods and Duration visits (TSP-ND), and Generalized TSP with Neighborhoods (G-TSP-N), to optimize the manipulator poses, path length, and cycle time. An efficient harvesting sequence plan was implemented by Plebe and Anile (2002) by converting the harvesting task into T-TSP and optimizing it to avoid twin collisions using a self-organizing map model. Jang et al. (2017) developed a TSP-N solver for path sequencing in dynamic obstacle environment, aiming at improving path quality and reduction in the cycle time. These task sequence and optimization strategies could solve the optimal sequence and reduce the cycle time and path length. However, the manipulator collision with branches might still be problematic.

2.2.4 Obstacle Avoidance

The path followed by the manipulator from start to target point without hitting any obstacles is referred to as a collision-free path. In the tree fruit environment, the obstacles are generally the branches and leaves. The manipulation in the presence of obstacles is a great challenge. Path planning and obstacle avoidance should be given attention for successful robotic operation for tree fruits. The term collision avoidance is sometimes interchangeably used with path planning. However, in reality, collision avoidance requires a separate set of considerations for path planning in a constrained environment. The complexity of path planning increased dramatically with the addition of the obstacle detection and avoidance components. In recent years, researchers have gained interest in obstacle detection and avoidance for robot collision-free path planning in the agricultural environment. Obstacle detection is the task performed by the machine vision system, such as camera and proximity and laser sensors. Researchers have integrated obstacle detection sensors with harvesting manipulators, such as a camera for litchi (Cao et al., 2019), a proximity sensor for apple (Zhao et al., 2011), and Light Detection and Ranging (LiDAR) sensor for cherry (Tanigaki et al., 2008). After the detection, the next critical task is to avoid the obstacles while maintaining the manipulator pose required to perform the specified task.

The collision-free path search strategies are categorized into four groups, namely, geometric (grid), probabilistic (random sampling), Artificial Potential Field (APF), and intelligence-based search algorithms (Li et al., 2019). These search algorithms have advantages and drawbacks in terms of path success, search space complexity, processing time, and path optimization (Hwang & Ahuja, 1992; Kaluđer et al., 2011; Kanehara et al., 2007; Yang & Luo, 2004). A performance comparison of different search algorithms is presented in Table 2.2.

Table 2.2 A comparison of different search algorithms

Geometric search algorithms are suitable for multi-objective problems, but these algorithms could give satisfactory results with up to two- to three-DoF manipulators (Nash et al., 2009). Probabilistic search approaches are sampling-based algorithms and are among the successful methods (Li et al., 2019). They are less affected by the DoFs of the manipulator but sometimes provide suboptimal solutions (Janson et al., 2017). The Artificial Potential (AP) search works under the influence of attraction and repulsion forces. The potential functions generate attractive forces from the target and repulsive forces from the obstacles (Khatib, 1986). Intelligence search solves multi-objective problems, such as obstacle avoidance with an optimal path using intelligence-based information (Noreen et al., 2016).

Researchers have put forward many strategies for collision-free path planning in the agricultural environment. Van Henten et al. (2003) used the A* search method (Table 2.2) for collision-free path planning of seven-DoF manipulators, but these search methods give satisfactory results for up to three-DoF manipulators (Noreen et al., 2016). For efficient manipulation and collision avoidance, the manipulator should have at least six DoFs. However, with the increase in the DoFs of the manipulator, the computational complexity and search time increase exponentially (Choset et al., 2005). Thus, these search algorithms may not be suitable for robotic operations in a tree fruit environment. Luo et al. (2018) investigated the APF-based search algorithm for collision-free path planning to harvest grapes. These methods resulted in high success, but drawdowns were due to the high processing time as well as non-optimal path solutions.

In recent years, many researchers have investigated probabilistic search algorithms such as rapidly exploring random tree (RRT) and Bi-RRT due to their higher pathfinding success and applicability for multi-DoF (up to 12 DoFs) manipulators (Cao et al., 2019; LaValle, 1998). You et al. (2020) investigated Batch Informed Tree (BIT*) algorithm for pruning grapevines using a six-DoF manipulator and achieved high pathfinding success. The RRT-based search approach is by far the most common strategy for collision-free pathfinding in a tree fruit environment. Botterill et al. (2017) and Zahid et al. (2020c) implemented RRT for the collision-free path planning of grapevine- and apple-pruning robots, respectively. In addition, multiple variants of RRT-based algorithms have also been investigated for robot collision-free path planning in the agricultural environment. Nguyen et al. (2013) implemented an RRT-based collision-free path planning framework to harvest apples using a nine-DoF manipulator. The authors used different algorithms and concluded that the RRT-Connect is the most efficient for path planning in terms of processing time. Cao et al. (2019) also used the RRT algorithm combined with the genetic algorithm (GA) for optimized path planning to harvest litchi. RRT usually has a longer path length due to intrinsic search properties. The RRT search combined with path smoothing and optimization algorithm was implemented by Zahid et al. (2020c) to reduce the path lengths and search time. Bac et al. (2017) implemented Bi-RRT to establish a collision-free path for harvesting sweet pepper in a controlled greenhouse environment. These RRT-based studies reported a high success rate for collision-free path creation in the agricultural environment, with varying processing times. The path planning time depends on the sampling resolution, which should be optimized considering the required path success rate.

2.3 Tree Canopy In-Field Sensing and 3D Reconstruction for Mechanization

2.3.1 In-Field Sensing Technologies

Advanced machine vision systems have been implemented to further develop mechanized equipment for tree fruit production, such as pruning, training, and harvesting, where human inputs are needed throughout each process. There are normally two types of approaches, namely, tree canopy 3D reconstruction and target object identification, depending on the agricultural tasks. Under field light conditions and complex planting patterns, tree canopies need to be reconstructed entirely for some of the mechanized tasks for better canopy characterization, localization, path planning, and geometry measurements. Various technologies have been developed for in-field sensing systems, such as photogrammetry and Light Detection and Ranging (LiDAR).

2.3.1.1 Photogrammetry Imaging for 3D Reconstruction

Utilizing photogrammetry is one of the most effective and affordable methods. One common approach is to use binocular stereo vision systems to reconstruct the target canopy or plant. Ni et al. (2016) developed a stereo vision system with two high-definition (HD) cameras (LifeCam Studio, Microsoft, Redmond, WA, USA) assembled parallelly. For 3D reconstruction, multiple images from different angles and views need to be taken around the target by adopting the Structure-from-Motion (SfM) method. The results showed that the true size of the target could be reconstructed, such as a small lemon tree with leaves. In addition, a time-of-flight-of-light-based (ToF) 3D camera was also often used, where studies have proven that this could reach a more accurate result than stereo vision systems for canopy reconstruction purposes (Beder et al., 2007). Karkee and Adhikari (2015) developed a method for identifying the apple tree trunks and branches for automated pruning using a ToF camera (CamCube 3.0, PMD Technologies, Siegen, Germany), which was mounted on a pan-and-tilt system under the field conditions. With the camera located approximately 1.27 m away from the target trees, all tree trunks and 77% of branches were successfully detected through canopy reconstructions. It was worth noting that all these target trees were young trees interspacing about 0.46 m trained in the tall spindle fruiting wall architectures. Karkee et al. (2014) used the same sensing equipment and tested the pruning results based on the algorithm against the human workers in the field. Results suggested that the root-mean-square deviation (RMSD) was 13% in branch spacing between the workers and the algorithm, which showed promise for algorithm-based automated fruit tree pruning.

Another common option for an affordable and portable camera is using the RGB-Depth (RGB-D) camera. The sensor uses the ToF principle with an infrared laser, a stereo vision system, or a combination of both to acquire depth information. Yang et al. (2019) used an RGB-D sensor, Kinect (Kinect v2, Microsoft, Redmond, WA, USA), for fruit tree 3D reconstruction where the RGB image (1920 × 1080) can be mapped to its depth image (640 × 480) to generate registered 2.5D point cloud data using the ToF principle. Such RGB-D information can provide relatively reliable spatial coordinates of the canopy objects, such as fruits and branches, within a few seconds with much less effort in camera calibration. The results showed an average relative error of 2.5%, 3.6%, and 3.2% with respect to the tree’s measurement in height, width, and thickness, individually. In addition to Kinect, RealSense cameras also play an important role as more compact RGB-D cameras in the market with only the need of power consumption from the USB portal, which could potentially benefit the field data collection or near-real-time processing. Dong et al. (2020) adopted a RealSense RGB-D camera (RealSense R200, Intel, Santa Clara, CA, USA; “R-series” uses stereo vision for computing the depth information) with a hand-held device to map a fruit orchard on a row basis from both sides. By integrating global features and semantic information, both sides of a series of trees can be merged and reconstructed for exploring further canopy characteristics, such as canopy volume, fruit count, and trunk diameter. Unlike “R-series,” “D-series” RealSense cameras utilize infrared light combined with stereo RGB matching to acquire depth information. Such a compact RGB-D camera also can be mounted on an unmanned aerial vehicle (UAV or drone) for faster canopy mapping.

2.3.1.2 LiDAR Imaging for 3D Reconstruction

With the fast development of high-performance computational platforms, Light Detection and Ranging (LiDAR) has offered an alternative method for outdoor canopy 3D reconstructions in addition to conventional ToF sensors. Despite the densely sensed data points and more complex calibration and preprocess steps (Moreno et al., 2020; Wang et al., 2021), LiDAR scanning can offer the most accurate 3D mapping results. Underwood et al. (2016) presented the work using a terrestrial scanning system equipped with LiDAR and other RGB sensors to map flower, canopy volume, and fruit distribution in the almond crop. Individual trees were scanned from both row sides at different times, where the complex internal branch structure and void spaces can be effectively detected by LiDAR mapping. However, there were some misaligned situations for 3D canopy reconstruction due to GPS or localization errors. Such misalignment could significantly affect the calculation of canopy geometry, such as canopy volume (where the voxel size was assumed as 0.001 m3 and accumulated over a tree), which should be realigned manually (Rosell et al., 2009) or using simultaneous localization and mapping (SLAM) (Cheein & Guivant, 2014). While one of the biggest problems is the occlusions induced by complex branch structures and leaves for ToF sensors, particularly for dense plants where some parts are entirely invisible from the scanner, LiDAR still can provide a certain level of accuracy. Bailey and Ochoa (2018) reconstructed a single dense-foliage tree by integrating the terrestrial LiDAR point cloud data and ray-tracing simulation data (Weber & Penn, 1995), where more than 30,000 leaves were digitally generated and compared. Additionally, some critical canopy parameters at the leaf level were also assessed in this work, such as leaf angle 3D distribution and measurement for biophysical processes. Another work at the leaf level has been completed by Berk et al. (2020), who assessed the leaf area using a terrestrial LiDAR system on 20 apple trees for future precise spraying management. Other than RGB and LiDAR data fusion, Narváez et al. (2016) showed the capability of integrating the thermal imagery and LiDAR data on avocados using portable devices for canopy 3D characterization. Due to the resolution difference between these two data types, all single frames must be registered together to obtain the point cloud data where each point has a temperature value assigned.

The agricultural environment can be complex and unpredictable. With the continued increment in computational performance using advanced hardware and software, precise characterization of tree canopies could be achieved for better facilitating mechanized and automated operations in orchards. 3D canopy reconstruction is one of the most effective ways to provide high-resolution, reliable leaf- or fruit-wise results, where the 3D location should help agricultural robotics with path planning and targets in occlusions, particularly with dense canopies. Some major advantages and disadvantages of reconstructing the entire 3D canopy or tree are summarized in Table 2.3 for comparison.

Table 2.3 Advantages and limitations of the 3D reconstruction for tree canopy

However, it is worth noting that offline reconstructed perennial trees and canopies can be retrieved later with integration with Real-Time Kinematics-Global Positioning System (RTK-GPS) and Global Navigation Satellite System (GNSS) for further intended tasks, such as precision spraying and pruning, because the main body of the plant can be permanent for at least about 10 years.

2.3.2 Image Processing Techniques

Typically, target crop localization, detection (Gongal et al., 2015), and segmentation (Amatya et al., 2017) from agricultural in-field imagery were performed using methods such as morphological operations and color thresholding. However, due to the complex in-field environment and various light conditions, these conventional methods are not sufficient. For example, to make the machine vision system work properly, the operations need to be conducted during nighttime (Amatya et al., 2016), or some other facilitating equipment needs to be installed to reduce the influence of different lighting conditions during the daytime, such as a black background curtain for the over-the-row machine (Gongal et al., 2016). Additionally, the processing speed is relatively slow, given the high resolution of imagery acquired for precise operations.

2.3.2.1 Deep Learning Algorithms

Deep learning-based algorithms, enabled by state-of-the-art AI technologies, started bringing people’s attention to image processing tasks about 10 years ago. In the agricultural field, this trend started in 2015. Instead of designing a network from scratch, pre-trained deep learning models (also known as transfer learning) are often used at the beginning by researchers. Pre-trained networks are rich in different characteristics since they were previously trained using thousands of images from public databases, such as ImageNet (Deng et al., 2009) and CIFAR (Krizhevsky & Hinton, 2009). Compared to randomly initializing a network, a pre-trained network may learn better. After considering the three key factors, i.e., accuracy, speed, and size, an appropriate network needs to be utilized and, in most cases, slightly modified for the output layers. Several commonly used networks are AlexNet (Krizhevsky et al., 2012), VGGs (Simonyan and Zisserman, 2014), ResNet (He et al., 2016), DenseNet (Huang et al., 2017), and NASNet (Zoph et al., 2018).

There are two main purposes for using deep learning in image processing: object detection and instance/semantic segmentation in agricultural tasks. Regarding most of the in-field mechanized operations for specialty crops, only one or a few specific types of target objects need to be focused on instead of the entire scene, such as the fruits in fruit harvesting, flowers in blossom thinning, branches in shoot thinning, and leaves in targeted pesticide spraying. Therefore, deep learning-based object detection has been extensively studied (Kamilaris et al., 2018). Zhang et al. (2018a, b) presented the work that deployed a Kinect RGB-D camera and a Region-based Convolutional Neural Network (R-CNN; fine-tuned AlexNet) to detect the segments of apple tree branches. Once all pieces of branch segments have been identified, the detection boxes and depth information have been integrated to predict the trajectory of the branch for automated vibratory apple harvesting in an orchard environment. Similar work has been completed by Majeed et al. (2020), where the segments of the vine cordons were detected and then combined using Faster R-CNN and non-maximal suppression algorithms in cordon trajectory estimation for green shoot thinning during the dense-foliage stage. In addition to one object detection, multiple targets also can be detected at the same time. Zhang et al. (2020) have demonstrated the capability of detecting apples, trunks, and branches using Faster R-CNN with the backbones of AlexNet or VGGs. By extracting different objects’ coordinates in the image, the exact vibrating location can be precisely estimated to proceed with the mechanical harvesting of apples. More specifically, once the branches’ trajectories have been determined, the apples’ locations can provide auxiliary information to decide the grabbing points for the end-effector while not causing any damage to the fruits, with about 73% accuracy achieved. Another similar work can be found by Gao et al. (2020) that a multi-class of fruit conditions (i.e., non-occluded, leaf-occluded, branch−/trellis wire-occluded, and fruit-occluded fruits) were investigated so that the harvesting machine can make better decisions to direct access to collision-free fruits. Another option to understand the entire scene is to conduct image segmentation, for instance, semantic segmentation, where images are classified at the pixel level. This technique was initially used in autonomous vehicle driving (LeCun et al., 2015) and was also applied in some agricultural tasks. Zhang et al. (2021) proposed a method using semantic segmentation to solve the tree trunks and branch identification for automated mass mechanical apple harvesting. Four different classes of pixels were defined as apples, branches, trunks, and leaves. Compared to multi-class object detection, segmentation offers more background information (e.g., leaves) and, more importantly, gives the boundaries of each object. This is particularly useful when the target object has irregular shapes so that a specific path planning should be considered by an agricultural robot to avoid any potential collisions.

2.3.2.2 Improvements in Deep Learning

It was reported that deep learning-based methods overall outperformed conventional image processing methods by tackling agricultural tasks with approximately 41% higher accuracy (Kamilaris et al., 2018). As a result, this method has already become a common practice in handling images with a complex background and lighting conditions, which is highly suitable for agricultural situations as most of the operations are conducted in a field environment. At the same time, researchers are also trying to improve the methodologies using deep neural networks to address the inherent challenges. As we know, introducing imbalanced data into a network can negatively impact the results (Van Hulse et al., 2007). However, this situation is commonly seen in the agricultural field. If the target objects have considerably fewer pixel numbers in an image compared to other objects, such as fruit stems, it would be challenging to train the networks to detect them as most of the pixels belong to the noisy background. One potential way to resolve the problem is to design a regression CNN or RegCNN (Kalampokas et al., 2021). Instead of only assigning each pixel a specific class (i.e., grape stem or non-stem), the distances of other pixels (i.e., non-stem) to the target pixels (i.e., grape stem) were calculated simultaneously. By utilizing regression models in CNNs, continuous values can be predicted to better estimate the stem location. In addition, high-resolution images are normally required in agricultural studies, but many of them suffer from this when feeding those high-resolution images into deep learning networks. Zabawa et al. (2019) presented a reasonable way of splitting large images into small patches and then feeding them into the networks. All small patches were again stitched together afterward. The computational speed can be greatly improved using such a method while preserving the good quality of the fed images. Lastly, it was also noted that most of the deep learning applications in agriculture had involved a dataset augmentation process (Kamilaris et al., 2018), which physically increased the diversity of the imagery dataset, such as image flipping and rotating, and brightness gain multiplier.

Because of the complexity and uncertainty of the uncontrolled agricultural environment, such as field conditions and various lighting conditions, deep learning has been proven highly useful and suitable in this research area. However, unlike some other applications such as autonomous driving which normally have a considerable number of available datasets from many different resources, agricultural research has always suffered from limited datasets. Additionally, agricultural datasets are challenging to be aggregated due to different sensors and methods used for various research purposes at different precision levels. Therefore, every research team has to annotate tons of the ground-truth labels, which would normally be the most time-consuming step. More importantly, those annotated labels are often used only once and are hard to reuse by other teams. Table 2.4 illustrates several major advantages and limitations of using deep learning applications.

Table 2.4 Advantages and limitations of using deep learning algorithms in canopy object detection and segmentation

Although there has been a growing community of using 3D scene reconstruction techniques in agriculture over the last few years, the nature of 3D image data compared to 2D image data has certainly brought some constraints, such as the very dense point cloud from LiDAR and long and complex data processing and saving. The superiority of 3D point cloud data is still to be discovered due to the limited availability of resources and tools. Recently, Google Artificial Intelligence (AI) group has released TensorFlow 3D along with the available code library on 3D point cloud data processing (Huang et al., 2020; Najibi et al., 2020), trying to bring state-of-the-art deep learning capabilities to address 3D object detection and 3D semantic/instance segmentation. With such type of efficient tool released, the barriers to deploying a real-time inference system tackling the 3D scene will be reduced for the entire research community.

2.4 Robotic Branch Pruning for Modern Apple Trees (Case Study)

2.4.1 Introduction

Pruning of apple trees is one of the most labor-intensive operations, requiring about 80–120 labor hours per hectare (Mika et al., 2016), accounting for 20% of the total labor costs (Crassweller et al., 2020). Robotic pruning of apple trees is challenging due to the complex tree canopy. The random orientation of the branches makes it difficult for the cutter to reach the desired orientation. Thus, the pruning robot should be designed considering the complex apple tree environment. Many studies have been reported on camera vision systems for 3D canopy reconstruction of apple trees (Karkee et al., 2014; Tabb & Medeiros, 2017). However, no considerable contribution has been reported on the development of a mechanical system, including the manipulator and end-effector for pruning apple trees (He & Schupp, 2018).

The joint configuration of the manipulators should be selected considering the work environment to avoid poor performance. As the joint configurations can change the posture of the robot for a specific task, the configuration of the manipulator should be selected carefully. The end-effector is an integral component of a robotic pruning system, consisting of a tool to perform the pruning cut. Only a few studies have been reported for pruning end-effectors with different cutting mechanisms, such as disk saws and shear blades (Botterill et al., 2017; Zahid et al., 2020a). Considering the complexity of tree canopies, compact robotic cutters are essential for successful operation, and they require appropriate component sizing.

Manipulation in the tree canopy can result in a collision with branches, which reduces the quality of pruning operation (Gongal et al., 2016). Collision-free path planning schemes are widely used for the motion planning of numerous systems such as autonomous vehicles and industrial robotics (Noreen et al., 2016). LaValle (1998) proposed a rapidly exploring random tree (RRT) algorithm for path planning, and it shows high efficiency compared to other available path planning schemes. However, the path solutions of the RRT algorithm are not always smooth and optimal, which results in more computational time and low convergence speed.

Considering the knowledge gap, the primary goal of this study was to develop a robotic manipulation system, including the manipulator and the end-effector for pruning apple trees. Alongside, different collision-free path planning algorithms were developed for the robotic pruning of apple trees. Finally, a series of field tests were conducted on the Fuji apple trees to validate the system performance.

2.4.2 Design of the Robotic Pruning Manipulator System

2.4.2.1 Pruning End-Effector Design

The primary criteria for the end-effector design include the minimum spatial requirement during maneuvering to position the cutter at a specific orientation. The end-effector should reach the target with a specific pose to place the branches within the shear blade opening (Zahid et al., 2020a). Thus, the end-effector should also have high kinematic dexterity to attain multiple poses of the cutter at each point in the workspace. A compact pruning end-effector was designed with the intrinsic three-revolute (3R) degrees of freedom (DoF) configuration (Fig. 2.5). A computer-aided design (CAD) software, SolidWorks (v. 2020, Dassault Systèmes, Vélizy-Villacoublay, France), was used. The design consists of three motors, each offering one revolute DoF to the end-effector. The widely accepted rotation convention, yaw, pitch, and roll (θ1, θ2, and θ3), was used to configure the DoFs of the end-effector. The selection of the cutting mechanism was critical to ensure a successful pruning operation. As efficient pruning requires smooth and split-free cuts, a shear blade was integrated with the end-effector as a cutter tool, attached directly. The maximum rotations for θ1, θ2, and θ3 were 240°, 360°, and 360°, respectively.

Fig. 2.5
A 3-D illustration of the end-effector concept design with labeled parts numbered from 1 to 6 and the rotational angle of Yaw as theta 1, pitch as theta 2, and roll as theta 3.

Concept design of the end-effector. (Components: (1) motor for yaw rotation, (2) motor for pitch rotation, (3) motor for roll rotation, (4) self-locking worm gearbox, (5) shear cutter, (6) cutter). (Zahid et al., 2020b)

2.4.2.2 Integrated Pruning Manipulator Design

The design of the pruning manipulator was a critical task due to the dense apple tree canopy. The key consideration for developing a pruning manipulator was the spatial requirements of the system. During manipulation, the manipulator utilizes a portion of the 3D workspace to change its pose to attain a specific position and orientation of the end-effector cutter. The magnitude of the pose change depends on the DoFs of the manipulator. Thus, it was essential to select the DoFs that offer a minimum pose change. A three-prismatic (3P) DoF system was selected to position the integrated 3R DoF end-effector cutter at target branches due to the low pose change attributes of the Cartesian/linear system. The integrated six (3R3P) DoF robotic pruning system, including a 3P DoF manipulator and 3R DoF end-effector, is shown in Fig. 2.6. The 3P DoF manipulator system was equipped with prismatic joints to move along the x-, y-, and z-axis, respectively. To avoid the dynamic instability and vibration due to the end-effector payload, the system consists of a squared base platform for motion in the x- and y-axes. The pruning end-effector was attached to a linear arm on the z-axis.

Fig. 2.6
An illustration of the SolidWorks model of the end-effector along the x, y, and z axes with labeled parts numbered from 1 to 6.

SolidWorks model of the end-effector attached to a Cartesian manipulator; the components include (1) x-axis rails, (2) y-axis rail, (3) z-axis linear actuator, (4) axis limit switches, (5) linear arm, and (6) pruning end-effector. (Zahid et al., 2020b)

2.4.2.3 Performance Indices of Robotic Pruner

The kinematic model of the robot was developed by calculating the Denavit-Hartenberg (DH) parameters to simulate the robot performance indices. Details on the robot kinematic model and DH parameters’ calculation can be found in the original research article (Zahid et al., 2020b). The forward kinematics of the integrated manipulator was used to calculate different performance indices, including reachable workspace, cutter frame orientations, manipulability, and velocity ellipsoids. The simulations were performed using Matlab (2019a, MathWorks, MA, USA) software to test different performance indices of the robotic pruner.

The simulation results for reachable workspace and cutter frame orientations of the end-effector are shown in Fig. 2.7a–c. The green, blue, and red lines show the 3D cutter frame of the end-effector. The robot workspace analysis indicated that the designed robotic pouring system has a spherical workspace of diameter 240 mm, with a void due to joint limitation. The void space may not affect the robot’s performance as it is very unlikely to prune the branches by rotating the cutter backward. Even with this situation, the Cartesian system can move the end-effector backward using the Cartesian positioning system, which will result in positioning the branches on the front side of the cutter. The simulation also indicated that the end-effector could attain a wide orientation of the cutter tool and could reach to cut almost every branch available within the workspace of the robot. The manipulability index was determined to be independent of the rotation of the first and last joint of the end-effector (Fig. 2.8). The result also suggested that the system has only two undesirable configurations of singularity. Based on the velocity ellipsoid simulations (Fig. 2.9), it was found that these singularity configurations could occur when the cutter is pointing vertically up or down (red lines), a very unlikely scenario to cut the branches.

Fig. 2.7
3 3-D plots of Z versus X and Y represent a reachable workplace, cutter plane, and cutter frame with discretization function. The graphs present spherical space with a close-knit structure in a. In b and c, spheres are constructed with loose materials.

Reachable workspace for the integrated end-effector with (a) reachable points, (b) cutter plane, and (c) cutter frame. (Zahid et al., 2020b)

Fig. 2.8
3 sets of 3-D color-coded surface graphs illustrate the relationship between the manipulability index versus theta 1 and theta 2 in degrees on the left panel, the manipulability index versus theta 1 and theta 3 in degrees in the center panel, and the manipulability index versus theta 2 and theta 3 in degrees on the right panel. A color gradient scale is on the right of the graphs.

Manipulability index of the integrated pruning end-effector. (Zahid et al., 2020b)

Fig. 2.9
3 sets of 3-D charts. 1. z in millimeters versus y in millimeters represents ellipsoids for theta 2 on the left panel. 2. z in millimeters versus x in millimeters represents ellipsoids for theta 2 in the center panel. 3. y in millimeters versus x in millimeters represents ellipsoids for theta 2 on the right panel.

Manipulability ellipsoids with rotation of theta 2 at different coordinate planes. (Zahid et al., 2020b)

2.4.3 Collision-Free Path Planning for Robotic Pruning

2.4.3.1 Reconstruction of Apple Trees

The 3D model of an apple tree was required to create collision-free paths. The data collection system consisted of a 3D laser scanner (VLP-16, Velodyne LiDAR, San Jose, CA, USA) and a laptop computer (Dell, Round Rock, TX, USA). The 3D point cloud data were preprocessed using Matlab software. Through the preprocessing, the point cloud image of the apple tree was established (Fig. 2.10a). A point cloud algorithm was used for the segmentation of branches and tree trunks (Fig. 2.10b). A few small branches were missed in the LiDAR scanner due to the limitation of sensor resolution. However, those small branches were not considered potential obstacles and were ignored for the 3D reconstruction. For connecting the point clouds of the trunk and branches, the Spline() function was used (Fig. 2.10c).

Fig. 2.10
3 sets of 3-D charts of z in meters versus y and x in meters represent a point cloud image of the apple tree in a, a segmentation of branches and tree trunks in b, and a 3-D reconstructed view of an apple tree in c.

(a) Point cloud data from the LiDAR sensor. (b) Segmented tree trunk and primary branches. (c) A view of a 3D reconstructed apple tree

2.4.3.2 Path Planning Algorithms and Simulation

An obstacle avoidance algorithm using a rapidly exploring random tree (RRT) search was implemented for a collision-free path to reach the target pruning points. The RRT algorithm performs two checks: manipulator collision and end-effector path collision. The target branch and pruning cut point coordinates were added to the algorithm to start the pathfinding. If the RRT search nodes exist in collision-free space, the specific path nodes are added to the final solution, and the process continues until the connected nodes reach the target location. Furthermore, RRT path smoothing and optimization algorithms were also developed to improve path planning. The RRT smoothing aimed to reduce the path length by removing unnecessary path nodes. For path optimization, a nonlinear optimization algorithm was implemented with initial and boundary conditions. The minimum avoidance distance from the obstacles was set to 60 mm.

The path planning was performed in a simulation environment to reach different target pruning points (Fig. 2.11). The coordinates of target pruning points on each branch were marked 20 mm away from the tree trunk. The path planning algorithms were successfully implemented to reach target branches at different orientations’ cutter as defined in the algorithms. The RRT algorithm was successful in finding a collision-free path (red line path) for defined pruning points within the virtual tree environment (Fig. 2.11). The smoothing and optimizing methods successfully reduced the RRT path lengths (green line path) for all target branches by removing the redundant nodes in the original path. The mean computational time was 14 s per branch. The path planning time depends on the number of collision checks required to establish a collision-free path. As the Cartesian motion (3P) occurred outside the tree canopy, the collision check was performed only for the rotational part of the robotic pruner (3R DoF end-effector) and the linear arm (position the end-effector inside the canopy), thus reducing the overall computational time for creating the path.

Fig. 2.11
A 3-D chart of z in meters versus x and y in meters represents poly deg 3 trajectory generation at t equal to 5.0634. A new robot is labeled.

Collision-free path planning using a 3R3P DoF robotic pruning manipulator

2.4.4 Prototype Development and Field Tests

The prototype of the integrated robotic pruner was developed at Penn State’s Fruit Research and Extension Center, Biglerville, Pennsylvania (Fig. 2.12). A set of three DC geared motors was used for the 3R end-effector. A modified DC motor-powered shear pruner, coupled with a gearbox, was attached as an end-effector cutting tool. Two NEMA-17 stepper motors were used for establishing the Cartesian motion along the x- and y-axes. To covert the rotational to linear motion, the belt and pulley mechanisms were attached to the motor shafts. As the z-axis has to carry the linear arm and the end-effector payload, a NEMA-34-driven lead-screw actuator was used. For field tests, an Arduino-based control system was developed to control the movement of the integrated robotic pruning system.

Fig. 2.12
A photograph displays the installation of an integrated pruning system set up in an apple orchard.

Experimental setup of the integrated pruning system in a Fuji apple orchard. (Zahid et al., 2020b)

The field tests were conducted on Fuji apple trees trained to fruiting wall architecture. In total, 100 cuts were applied on branches at a wide array of orientation ranges. The cuts were applied at 20 mm from the tree trunk to evaluate the end-effector cutter’s capability to prune the branches close to the trunk. For each successful cut, the branch diameter and the robot’s joint angles were recorded. The field tests validated the design parameters of the integrated pruning system. The field tests validated the design parameters and all simulation results. During the test, it was observed that the cutter could collide with the trunk when the target point was close to the trunk, and only the perpendicular cutting posture was considered. The chance of missing the target branch increased when the cutter plane and branch axis were not perpendicular, as the effective cutter opening for the branch entrance was reduced. The developed cutter was able to reach all targeted branches and cut up to 25-mm-diameter branches. With this cutting capability, the developed robotic system is suitable to use in the modern apple tree architecture.

2.5 Conclusion and Future Directions

As we discussed earlier, tree structures in the modern orchard are getting much simpler by adopting the intensive planar training system. The simpler canopies provide opportunities for implementing mechanical and robotic solutions for the in-field tasks of tree fruit production. An accurate, robust, fast, or inexpensive system would be considered a successful robotic system. However, even with modern trees, these in-field tasks can still be challenging due to the nature of the biological system, especially the interaction between the tree canopy and the robotic systems. For example, with robotic pruning, the cuts on branches require high precision with a cutting end-effector, applied at the right locations and perpendicular to branch orientation. This chapter reviewed and analyzed the core technologies for the robotic solutions for modern tree orchards, including robot-canopy interaction, in-field sensing, collision-free path calculation, and manipulation control.

Regarding in-field sensing, although canopy reconstruction may provide more in-depth 3D information for any in-field mechanized operations, the speed and computation constraints have limited its current usage. During the last few years, proximal in-field sensing technologies and processing techniques have been greatly advanced with the prosperously developed AI-driven disciplinaries. Deep learning-based techniques have gradually become the common practice for image processing (LeCun et al., 2015), which is the critical first step for orchard automation and mechanization. As a continuously growing community becomes more interested in utilizing AI-enabled deep learning techniques in agricultural research, the future directions include (1) further improvement of deep learning networks’ architectures, such as adding attention mechanism module (Fu et al., 2019), using regional dropout method (DeVries & Taylor, 2017), and adding gradient noises (Neelakantan et al., 2015); (2) utilizing generative adversarial networks (GANs; Goodfellow et al., 2014) to address the main issue of the limited number of agricultural images and annotations; and (3) further developing semi−/self-supervised deep learning techniques (Ji et al., 2019; Wu & Prasad, 2017) that require much less or no manual annotations. Although this is a highly promising research direction, some major concerns were also presented. For example, most of the researchers are still paying too much attention to sensing technologies themselves only, rather than implementing the technologies into actual mechanized orchard operations or canopy management. In addition, onboard computing with embedded systems (e.g., NVIDIA Jetson TX2 module) will be highly necessary for utilizing such deep learning-driven techniques in real orchard scenarios.

The accessibility of the robotic manipulator and end-effector is challenging due to the complexity and variability of the agricultural environment, as well as the required speed of operation. The previously developed pruning robots were typically using serial robotic arms, while this level of specificity in the spatial placement of the end-effector results in a complex set of maneuvers and slows the pruning process. Meanwhile, a serial robot arm with an end-effector requires a large space for the cutter to engage with the branches. Although it is not for pruning directly, the effort has been made to simplify the maneuvers and improve the efficiency of robotic operations in harvesting. Two robotic fruit-picking robots have been developed and tested; as mentioned earlier, one is from FFRobotics (Gesher HaEts 12, Bnei Dror, Israel), and the other one is from Abundant Robotics (Abundant Robotics, California, CA, USA). Similar robotic arms could be considered for developing pruning systems. However, these arms did not need to achieve specific orientations to pick fruits. For robotic pruning, the end-effector (cutter) needs not only to reach the right location but also to be placed perpendicularly to the branch. To be always perpendicular to the branch as well as using the parallel type of robotic arm, the end-effector should be with adjustable orientation. With this kind of end-effector, the cutter itself could be rotated with very small spatial effort. Moreover, the cutter could be made of a saw blade with no specific orientation constraints.

Finally, the economics of the robotic pruning system also needs to be considered. Typically, the cost of a robotic system is high. The use of off-the-shelf robotic arms (such as Robolink, Igus) and low-cost sensing system (such as Kinect v2, Microsoft) could decrease the overall cost of robotic systems. Therefore, with the consideration of the labor shortage issue as well as putting effort into building low-cost robotic pruning systems with off-the-shelf components, the benefit of developing a robotic pruning system would be obvious. Meanwhile, multiple robots could be employed to improve working efficiency. The cost/benefit ratio of a robotic system will have to be analyzed after the machine is built.