1 Introduction

Single cell micro-injection is one of the key technologies for biological research and applications (Saleem and Kannan 2018; Adamson et al. 2018). In general, the single cell microinjection begins by picking, moving and immobilizing the cell with a holding micropipette followed by puncturing the cell with an injection needle and foreign substances such as DNA, RNA, protein, and drug compound, etc. (Murphey et al. 2006; Pitchar et al. 2019; Huang et al. 2009). Automation of these processes will have far-reaching influence on the efficiency and productivity for many applications (Argenton et al. 2007, Chen et al. 2017 and Chung et al. 2015). A lot of research efforts and progresses have been made on this issue (Zhuang et al. 2017; Liu et al. 2017). In real-world applications, many important operations and processes such as vitrification of mammalian embryos (Liu et al. 2015), suction of sperm by a micropipette (Zhang et al. 2012), identification and position of specific structures of a cell (Wang et al. 2017; Gianaroli et al. 2011; Chen et al. 2016), perception and control of needle penetration force (Ghanbari et al. 2012; Liu et al. 2007; Wang et al. 2017), adjustment of cell posture (Zhao et al. 2015; Benhal et al. 2014; Xie et al. 2016), and single cell microinjection have been partially automated. In our previous research work (Zhang et al. 2019), automated injection of a zebrafish embryo was conducted by using an electrothermal microgripper. It was demonstrated that the microgripper not only can effectively pick and hold the cell (Luca 2011; Wang and Xu 2017; Wang et al. 2007), but also can immobilize the cell much firmly compared with the holding micropipette. In addition, the deformation of the embryo can be limited to minimum with the two clamping jaws.

However, automation of the injection process assumed that the point that the micropipette/microgripper, the cell, and the injection needle have all been positioned in the micro-FOV and well-focused in the culture medium. In practice, several preparation steps need to be made, such as moving the micropipette/microgripper, biological cells, and the injection needle from outside to within the micro-FOV, and moving down the micropipette/microgripper as well as the injection needle downward to be immerged in the liquid and within the focusing area. These preparations were conducted manually in robotic single cell microinjections reported in the literature. As shown in Fig. 1, the green and red arrows indicate the manual operations and automated processes respectively.

Fig. 1
figure 1

The schematic illustration of the whole process of cell microinjection

This work aims to automate the preparation process, indicated by green arrows in Fig. 1, by dividing it into two sub tasks. The first task is to move the microgripper from outside to inside the micro-FOV. First, an extra macro camera is used to create a macro-FOV containing the microscopic lens (where the micro-FOV is located), the microgripper, and the cell. Then, a large rectangular matching template is used to identify the microscope lens (that is, the position of the micro-FOV). Once the microscope lens is matched, the rectangular is used as the searching area for finding the micro-FOV. Following a serpentine path or zigzag searching path, the microgripper is moved from the lower-left corner of the rectangular area to the micro-FOV. A grid-line algorithm is proposed to detect the microgripper jaws. The second task is to move the microgripper jaws down to be immerged into the liquid and auto focus the jaws. There have been various auto-focusing methods. Wang proposed a unique technique to detect the contact of the micropipette tip to the petri dish (Liu et al. 2015; Wang et al. 2007). By moving the micropipette down to make contact to the petri dish, and then lifting the micropipette up to a calibrated distance, the micropipette becomes clear in the image. A number of auto-focusing algorithms have been proposed such as AUM algorithm (Sun et al. 2004) and Contented-based Autofocusing algorithm (Hamm et al. 2010), etc. However, these algorithms work under the assumption that the micropipette tip had already been immerged in the liquid. When the micropipette tip made contact to the surface of the liquid, a black shadow appears around the contact area in the micro-FOV. This shadow makes auto-focusing rather challenging since a lot of image information is lost. In this work, we propose an algorithm to identify the contact. Once the microgripper jaws have been fully immerged in the liquid, the contact status is ended, and auto-focusing algorithms can be applied. We compared and evaluated the performances of eight algorithms to provide insight and guidance on selection of an appropriate auto-focusing algorithm. Finally, up to 100 experiments were carried out to validate the proposed algorithms and methods.

This paper is organized as follows. In Sect. 2, we first present the micro-injection system and identify the problems. In later sections, detailed problem descriptions and algorithm implementations, analyses, and results are addressed. Finally, experimental testing is conducted, and the proposed algorithms are evaluated based on the testing results in order to demonstrate the feasibility, robustness, and efficiency of the proposed algorithms.

2 Problems Identification and algorithms

2.1 Microinjection system step

The zebrafish embryo with a diameter of around 600–800 microns is one of the most used single cells for biological research, drug discovery and many applications (Xie et al. 2008). In our previous work, we accomplished the automated single zebrafish embryo microinjection using an electrothermal microgripper (Zhang et al. 2019). However, common to the current methods reported in literature, the automation process started from the point that the gripper has been positioned in the micro-FOV and the gripper jaws hung right over the cell. The whole automation process was based on teaching, i.e., all the steps were predefined. The microgripper was positioned at the same height to the bottom of the petri dish as the floating embryo in order to make it clear in image. As a result, auto-focusing algorithms were involved. However, before that point, the microgripper jaws, the cell, and the injection needle tip needed to be positioned manually in the micro-FOV. In this work, we aim to automate both the micro-to-micro positioning and the focusing steps based on the proposed algorithms. All the algorithms and methods presented in this paper can be applied to other microtools and micro-manipulation processes.

A robotic micro-manipulation system is established for zebrafish embryo microinjection as shown in Fig. 2. The micro-FOV is created by an inverted microscopy (Ml-11, Mshot) with the 4X objects lens. To realize the macro-to-micro FOV positioning, an extra macro camera (HD98, Gucee) is added to the system. The microgripper and the injection needle are mounted on a pair of micro-manipulators (CFT-8301D2) respectively, which can provide XYZ-axis independent linear motions. The microgripper is mounted on a PCB board. The zebrafish embryo is cultured in water on a petri dish, which is placed on a motorized XY stage (FL35ST28, 0504BF12).

Fig. 2
figure 2

a The photo of the system. b The schematic diagram of the system

2.2 Visual system calibration

As shown in the Fig. 3, the macro camera coordinate (\( O_{c} - X_{c} Y_{c} Z_{c} \)), micro camera coordinate (\( O_{s} - X_{s} Y_{s} Z_{s} \)) and operation coordinate (\( O_{m} - X_{m} Y_{m} Z_{m} \)) are established. Image coordinates \( O_{s} - u_{s} v_{s} \) and \( o_{c} - u_{c} v_{c} \) are based on (\( O_{s} - X_{s} Y_{s} Z_{s} \)) and (\( O_{c} - X_{c} Y_{c} Z_{c} \)) coordinate systems. The micro camera coordinate system (\( O_{s} - X_{s} Y_{s} Z_{s} \)) is established on the focal plane, the origin \( O_{s} \) is the intersection of the optical axis centerline and the focal plane, and the \( Z_{s} \) axis direction is the direction in which the optical axis points to the observation target. The \( u_{s} \) and \( v_{s} \) axes are the same as the \( X_{s} \) and \( Y_{s} \) axes, respectively. The operation coordinate (\( O_{m} - X_{m} Y_{m} Z_{m} \)) is established at a certain point in the operation space. The \( X_{m} \),\( Y_{m} \) and \( Z_{m} \) axes are the same as the X, Y, and Z directions of the operator. The system permits online calibration when microgripper and injecting pipette are controlled to immobilize and inject embryo cell according to image-based visual servo, respectively. When the target is moved in the focal plane, coordinates of the manipulator in \( O_{s} - X_{s} Y_{s} Z_{s} \) and the position in \( O_{s} - u_{s} v_{s} \) are stored in the microchip memory for calibration. In micro-operations, the relative relationship is used more frequently and is easier to obtain, so the relative motion relationship in different coordinate systems is considered here.

Fig. 3
figure 3

Schematic diagram of visual servo system

$$ \left[ {\begin{array}{*{20}c} {\Delta x_{m} } \\ {\Delta y_{m} } \\ {\Delta z_{m} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {n_{x} } & {o_{x} } & {a_{x} } \\ {n_{y} } & {o_{y} } & {a_{y} } \\ {n_{z} } & {o_{z} } & {a_{z} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\Delta x_{s} } \\ {\Delta y_{s} } \\ {\Delta z_{s} } \\ \end{array} } \right] = R_{s}^{m} \left[ {\begin{array}{*{20}c} {\Delta x_{s} } \\ {\Delta y_{s} } \\ {\Delta z_{s} } \\ \end{array} } \right] $$
(1)

where: (\( \Delta x_{m} \), \( \Delta y_{m} \), \( \Delta z_{m} \)) and (\( \Delta x_{s} \), \( \Delta y_{s} \), \( \Delta z_{s} \)) are the relative movement amounts in the operating coordinate system and the camera coordinate system, respectively; \( R_{s}^{m} \) is from the operating coordinate system to the camera coordinates System’s attitude transformation matrix. The imaging plane of the micro camera is perpendicular to its optical axis, so when the object is clearly imaged, \( \Delta z_{s} = \, 0 \) is established. According to the internal parameter model of the camera,

$$ \left[ {\begin{array}{*{20}c} {\Delta x_{s} } \\ {\Delta y_{s} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {1/k_{x} } & 0 \\ 0 & {1/k_{y} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\Delta u_{s} } \\ {\Delta v_{s} } \\ \end{array} } \right] $$
(2)

where: (\( \Delta u_{s} \),\( \Delta v_{s} \)) are the image coordinate increments; \( k_{x} \) and \( k_{y} \) are the magnification coefficients from the imaging plane coordinates to the micro camera coordinate.

According to (1) and (2),

$$ \left[ {\begin{array}{*{20}c} {\Delta x_{m} } \\ {\Delta y_{m} } \\ {\Delta z_{m} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {n_{x} /k_{x} } & {o_{x} /k_{x} } \\ {n_{y} /k_{y} } & {o_{y} /k_{y} } \\ {n_{z} /k_{z} } & {o_{z} /k_{z} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\Delta u} \\ {\Delta v} \\ \end{array} } \right] = J^{ + } \left[ {\begin{array}{*{20}c} {\Delta u_{s} } \\ {\Delta v_{s} } \\ \end{array} } \right] $$
(3)

where \( J^{ + } \) is a transformation matrix between the motion increment of the target in the image space and the motion increment in the Cartesian space, that is, the pseudo inverse of the Jacobian matrix of the image.

Equation (3) is a relative measurement model based on image Jacobian matrix. During calibration, the manipulator drives the feature points to actively move in the clear imaging plane of the microscope camera. The movement amount of the feature point in Cartesian space can be read by the controller of the manipulator, and the movement amount in the image space is obtained by image processing.

For \( n \) active movements of the selected target in the clear imaging plane, we get

$$ \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\Delta x_{m1} } \\ {\Delta y_{m1} } \\ {\Delta z_{m1} } \\ \end{array} } & {\begin{array}{*{20}c} {\Delta x_{m2} } \\ {\Delta y_{m2} } \\ {\Delta z_{m2} } \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} \cdots \\ \cdots \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} {\Delta x_{mn} } \\ {\Delta y_{mn} } \\ {\Delta z_{mn} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {J_{11}^{ + } } & {J_{12}^{ + } } \\ {J_{21}^{ + } } & {J_{22}^{ + } } \\ {J_{31}^{ + } } & {J_{32}^{ + } } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\Delta u_{1} } \\ {\Delta v_{1} } \\ \end{array} } & {\begin{array}{*{20}c} {\Delta u_{2} } \\ {\Delta v_{2} } \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} \cdots \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} {\Delta u_{2} } \\ {\Delta v_{n} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] $$
(4)

where (\( \Delta x_{mi} \), \( \Delta y_{mi} \), \( \Delta z_{mi} \)) is the relative movement amount of the i-th active movement of the feature point in Cartesian space; (\( \Delta u_{i} \), \( \Delta v_{i} \)) is the movement amount of the i-th active movement of the feature point in image space; n is the number of times of exercise. If the feature points have at least two linearly independent motions, the matrix B is full rank. Therefore, the image Jacobian matrix pseudo-inverse \( J^{ + } \) is

$$ J^{ + } = AB^{T} \left( {BB^{T} } \right)^{ - 1} $$
(5)

where \( J^{ + } = \left[ {\begin{array}{*{20}c} {J_{11}^{ + } } & {J_{12}^{ + } } \\ {J_{21}^{ + } } & {J_{22}^{ + } } \\ {J_{31}^{ + } } & {J_{32}^{ + } } \\ \end{array} } \right] \) ; \( A = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\Delta x_{m1} } \\ {\Delta y_{m1} } \\ {\Delta z_{m1} } \\ \end{array} } & {\begin{array}{*{20}c} {\Delta x_{m2} } \\ {\Delta y_{m2} } \\ {\Delta z_{m2} } \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} \cdots \\ \cdots \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} {\Delta x_{mn} } \\ {\Delta y_{mn} } \\ {\Delta z_{mn} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] \) \( B = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\Delta u_{1} } \\ {\Delta v_{1} } \\ \end{array} } & {\begin{array}{*{20}c} {\Delta u_{2} } \\ {\Delta v_{2} } \\ \end{array} } & {\begin{array}{*{20}c} {\begin{array}{*{20}c} \cdots \\ \cdots \\ \end{array} } & {\begin{array}{*{20}c} {\Delta u_{2} } \\ {\Delta v_{n} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] \)

After obtaining the image Jacobian matrix pseudo-inverse \( J^{ + } \), using its column vector unitization can obtain \( \left[ {n_{x} ,n_{y} ,n_{z} } \right]^{T} \) and \( \left[ {o_{x} ,o_{y} ,o_{z} } \right]^{T} \). Multiplying \( \left[ {n_{x} ,n_{y} ,n_{z} } \right]^{T} \) and \( \left[ {o_{x} ,o_{y} ,o_{z} } \right]^{T} \) to get \( \left[ {a_{x} ,a_{y} ,a_{z} } \right]^{T} \). The inverses of the modulus of the \( J^{ + } \) column vector are \( k_{x} ,k_{y} \), respectively.

$$ \left\{ \begin{aligned} \left[ {\begin{array}{*{20}c} {n_{x} } & {n_{y} } & {n_{z} } \\ \end{array} } \right]^{T} = \frac{{\left[ {\begin{array}{*{20}c} {J_{11}^{ + } } & {J_{21}^{ + } } & {J_{31}^{ + } } \\ \end{array} } \right]^{T} }}{{\left[ {\begin{array}{*{20}c} {J_{11}^{ + } } & {J_{21}^{ + } } & {J_{31}^{ + } } \\ \end{array} } \right]^{T} }} \hfill \\ \left[ {\begin{array}{*{20}c} {o_{x} } & {o_{y} } & {o_{z} } \\ \end{array} } \right]^{T} = \frac{{\left[ {\begin{array}{*{20}c} {J_{12}^{ + } } & {J_{22}^{ + } } & {J_{32}^{ + } } \\ \end{array} } \right]^{T} }}{{\left[ {\begin{array}{*{20}c} {J_{12}^{ + } } & {J_{22}^{ + } } & {J_{32}^{ + } } \\ \end{array} } \right]^{T} }} \hfill \\ \left[ {\begin{array}{*{20}c} {a_{x} } & {a_{y} } & {a_{z} } \\ \end{array} } \right]^{T} = \left[ {\begin{array}{*{20}c} {n_{x} } & {n_{y} } & {n_{z} } \\ \end{array} } \right]^{T} \times \left[ {\begin{array}{*{20}c} {o_{x} } & {o_{y} } & {o_{z} } \\ \end{array} } \right]^{T} \hfill \\ k_{x} = \frac{1}{{\left[ {\begin{array}{*{20}c} {J_{11}^{ + } } & {J_{21}^{ + } } & {J_{31}^{ + } } \\ \end{array} } \right]^{T} }} \hfill \\ k_{y} = \frac{1}{{\left[ {\begin{array}{*{20}c} {J_{12}^{ + } } & {J_{22}^{ + } } & {J_{32}^{ + } } \\ \end{array} } \right]^{T} }} \hfill \\ \end{aligned} \right. $$
(6)

The calibration between the macro camera coordinate \( (O_{c} - X_{c} Y_{c} Z_{c} ) \) and operation coordinate \( (O_{m} - X_{m} Y_{m} Z_{m} ) \) is same as above except that \( \Delta z_{c} \ne 0 \).

2.3 Macro-to-micro FOV positioning

Initially, the microgripper, injection needle, and the embryo are placed freely in the macro-FOV as shown in Fig. 4a. The first step for microinjection is to position these microtools and the embryo into the micro-FOV, which is determined by the microscopy lens. This process is conducted manually and is time-consuming since it requires the operator to move the microtools and the cell towards the lens and monitor to see if the microgripper jaws, the injection needle and the cell appear in the micro-FOV.

Fig. 4
figure 4

The micro-FOV is located at the microscopic lens. a The microgripper and cells are not in the micro-FOV initially. b The black area appears when the gripper jaws made contact to the liquid

To automate this time-consuming manual process, an extra macro camera is added to the robotic system. Take the microgripper for example, in this work, the critical steps of the proposed macro-to-micro FOV positioning algorithm are as follows. (1) Recognition of objective lens, where micro-FOV is located, using template matching. (2) Recognition of the microgripper jaws. (3) Moving the microgripper into the searching domain. In the first step, when the objective lens is matched, the matching rectangular area and boundary are created. This rectangle will act as the domain for searching the micro-FOV. (4) Moving the microgripper jaws following the designed path and steps, and eventually the microgripper jaws will appear in the micro-FOV. (5) Locating the microgripper jaws in the micro-FOV using the proposed grid-line method. Now, the microgripper jaws are positioned in the micro-FOV.

2.4 Auto focusing

To effectively grip and immobilize the cell, the microgripper jaws must be moved down into the liquid culture media and positioned at the same height as the cell. The cell can be focused automatically with various algorithms. These algorithms are based on two assumptions: (1) The background image remains unchanged; (2) The objects do not move in the process of focusing. However, in the auto-focusing of the microgripper jaws, the two assumptions will not apply. The challenge for auto-focusing is that when the gripper jaws move down and make contact to the liquid, a black area appears in the micro-FOV, and a lot of image information will be lost as shown in the Fig. 4b. In addition, the black area will move simultaneously as moving the gripper jaws down to be immerged in the liquid. Also, in practice, the microgripper jaws can vibrate due to compliance of the structure, and the cell is likely to be driven to move in the liquid as well.

In this work, we attempt to conduct the auto-focusing process in two separate steps: (1) First, the contact and immersion recognition algorithms are proposed to guide the microgripper jaws to be fully immerged into the water. (2) Second, the matching template for recognition of the gripper jaws will be used as an active window for auto-focusing such that the influence of the vibration of the jaws can be reduced to minimum. This active window is created by template matching of the microgripper jaws for every focusing steps. As a result, the microgripper jaws remain unmovable within this window, and existing algorithms will apply. Then we compared and evaluated up to eight auto-focusing algorithms to provide insight and guidance for selecting the appropriate algorithms.

3 Macro-to-micro FOV positioning

Initially, the microgripper is placed away from the objective lens. Since the micro-FOV is captured by the micro camera via the objective lens, an extra macro camera is used to create a macro-FOV which should contain the lens. In order to position the microgripper jaws into the micro-FOV, the first step is to locate the position of the objective lens in the macro-FOV. In this work, we used a rather wide rectangular template for matching and recognition of the objective lens based on two considerations. (1) The droplet of the liquid in the petri dish will distorted the contour of the objective lens. Larger template results in more accurate matching. (2) The matching area is to be used as the searching domain for the lens. Therefore, larger the rectangular template larger the searching domain. In this work, the normalized correlation coefficient template matching algorithm (Chen et al. 2007) is used to recognize the objective lens. This algorithm compensates the intensity variances among different images, which is more suitable for the situation where direct light is presented. The definition of this algorithm is as

$$ R = \frac{{\mathop \sum \nolimits_{{x^{\prime},y^{\prime}}} T^{\prime } \left( {x^{'} ,y^{'} } \right) \times I^{\prime } \left( {x + x^{'} ,y + y^{'} } \right)}}{{\sqrt {\mathop \sum \nolimits_{{x^{\prime},y^{\prime}}} T^{\prime } \left( {x^{'} ,y^{\prime}} \right)^{2} } \times \sqrt {\mathop \sum \nolimits_{{x^{\prime},y^{\prime}}} I^{\prime } \left( {x + x^{'} ,y + y^{'} } \right)^{2} } }} $$
(7)

where

\( T^{\prime } \left( {x, y} \right) = T\left( {x, y} \right) - \frac{{\mathop \sum \nolimits_{{x^{\prime } ,y^{\prime } }} T\left( {x^{\prime } , y^{\prime } } \right)}}{{\left( {w \times h} \right)}} \) and \( I^{\prime } \left( {x + x^{'} , y + y^{'} } \right) = I\left( {x + x^{'} , y + y^{'} } \right) - \frac{{\mathop \sum \nolimits_{x,y} I\left( {x, y} \right)}}{{\left( {w \times h} \right)}} \). \( T\left( {x, y} \right) \) is the pixel intensity of the point \( \left( {x, y} \right) \) in the template, \( I\left( {x, y} \right) \) is the pixel intensity of the point \( \left( {x, y} \right) \) in the image to be searched, \( w \) and \( h \) represent the width and height of the image respectively, and \( R \) represents the degree of matching between the searching domain and the template. Closer the value of \( R \) is to 1, higher the degree of matching between the template and the objective matching image becomes.

When the objective lens is located, the searching domain is determined, as shown in Fig. 5, in which the red rectangle represent the searching domain. Then, it is necessary to recognize the microgripper jaws in the macro-FOV. Since the microgripper is almost transparent especially under the microscopic light, it is difficult to recognize the microgripper jaws via direct template matching. Since the two pads of the PCB board have rather large rectangular area, the two pads are recognized via template matching and the location of the microgripper jaws can be estimated based on the structure and dimension relations between the two pads and the center point of the microgripper jaws. To facilitate image matching, we glued two pieces of red rectangular paper to the two pads, as shown in Fig. 5.

Fig. 5
figure 5

Illustration of the Macro-to-Micro FOV positioning algorithm

Known the location of the searching area and the microgripper jaws, it is possible to move the microgripper jaws to the lower left corner of the searching area and begin to search for the micro-FOV. In this work, we designed two searching paths, that is, a serpentine path (Path A) and a zigzag path (Path B), for the gripper jaws to follow, as shown in Fig. 6.

Fig. 6
figure 6

Comparison of different search paths. a Path A. b Path B. c Scatter plots of the distribution of gripper for the first time in the micro FOV

To fast detect the microgripper jaws in the micro-FOV, we divided the micro-FOV into 12 small areas with \( 5 \times 4 \) grid lines. The grid lines act as alert lines to detect the presence of the gripper jaws. The advantage of using grid-line detection is that there is no need to search the whole micro-FOV to detect the presence of the microgripper jaws. Instead, by only monitoring the changes of the grey levels within the grid lines, the presence of the gripper jaws can be detected. The width of the grid lines is set to 20 pixels with consideration of both detection efficiency and robustness. In addition, the number of grid lines is set as \( 5 \times 4 \) such that the distance between adjacent horizontal lines is more or less equal to the diameter of the circle the jaws formed. As a result, when the jaws totally appear in the micro-FOV, two horizontal lines get triggered at the same time. In this manner, not only the gripper jaws can be detected but also their positions can be estimated.

Assisted by the grid-line detection method, the performance of two searching paths is evaluated and compared. As shown in Fig. 6, the black dots represent the positions of the center of the two gripper jaws when the jaws are totally appeared in the micro-FOV and detected following Path A. The purple dots represent the positions of the center of the two gripper jaws when the jaws totally appear in the micro-FOV and detected via following Path B. Fifty experiments are conducted, and the distributions of the dots, i.e., the possibility of the first detection in the micro-FOV are represented by the blue and red rectangular areas for path A and B respectively. It is clear that, following the searching path A, the microgripper jaws are more likely to appear in the corner of the micro-FOV. This is desirable because the farther the microgripper jaws appear away from the center of the micro-FOV, the more room the microgripper jaws can have to adjust their positions. In this work, we select Path A for searching the micro-FOV. Based on the same algorithms and methods, the cell and the injection needle can be positioned in the micro-FOV as well.

4 Auto-focusing algorithms

When the microgripper jaws as well as the cell appear in the micro-FOV, both the microgripper jaws and the cell are recognized via template matching algorithm and Hough circle detection algorithm, respectively. Next, the microgripper jaws are moved to be right over the cell. At this point, the microgripper jaws are not in the liquid. To grip and hold the cell, the gripper jaws need to be moved down to the liquid and positioned at the same height as the cell, and focused.

Since when the gripper jaws made contact to the surface of the liquid, due to the surface tension of the liquid and refraction, a black area appears around the contact area. This black area makes the template matching and auto-focusing algorithms fail because it changes dynamically as moving of the microgripper jaws and a lot of image information will be lost. Since the black area causes significant increase of the grey level and decrease of the variant value, for every step, the microgripper jaws are detected via template matching as moving down and at the same time the average grey level of the matching area is calculated. If the value of the average grey level exceeds certain threshold, we can say that the microgripper jaws are in contact to the surface of the liquid. Then continue to move the microgripper jaws down into the liquid certain distance to be fully immerged in the liquid. Based on 200 times of experiments, the distance is around 560 μm. That is, from the first contact being made, the microgripper required to move down around 560 μm to be fully immerged into the liquid. Once the value of the average grey level is smaller than the threshold, the black area moves outside the matching area of the microgripper jaws, as shown in Fig. 7.

Fig. 7
figure 7

The microgripper jaws move and imerged into the liquid. a A black area appears at the contact area. b Microgripper jaws fully imerged into the liquid and the back contact area moves outside the microgripper jaws. The red rectangle represents the template matching area

According to the imaging principle of the microscope, closer the object to the focal plane of the microscope clearer the image, and farther away from the focal plane, more blurred the image. Therefore, we can adjust the sharpness of the image by controlling the relative distance between the object and the focal plane of the microscope. Image sharpness is a parameter that characterizes the sharpness of an image. In the spatial domain, image sharpness is expressed as whether the boundaries and details of the image are clear or not. In the frequency domain, it is explained as whether the high-frequency components of the image are abundant. In statistical characteristics, it is expressed as whether the greyscale distribution is uniform. Therefore, the evaluation function for judging the sharpness of an image can be categorized into spatial methods, frequency methods, and statistical methods, respectively. In order to obtain the most suitable sharpness evaluation functions, in what follows, we have selected 8 most commonly used and classic sharpness evaluation functions for analyses and comparisons (Nayar 1994; Subbarao and Choi 1993; Groen et al. 1985).

  1. (1)

    Frequency selective weighted median filter (\( FSWM \)) (Zhao et al. 2019), It calculates the sum of the squares of the median difference of the grey levels of three consecutive pixels in the horizontal and vertical directions as

    $$ F_{FSWM} = S_{x} + S_{y} $$
    (8)

where \( S_{x} = \mathop \sum \nolimits_{M} \mathop \sum \limits_{N} \left( {f\left( {I\left( {x - 2,y} \right),I\left( {x - 1, y} \right),I\left( {x,y} \right)} \right) - f\left( {I\left( {x + 2,y} \right),I\left( {x + 1, y} \right),I\left( {x,y} \right)} \right)} \right)^{2} \) and \( S_{y} = \mathop \sum \nolimits_{M} \mathop \sum \nolimits_{N} \left( {f\left( {I\left( {x,y - 2} \right),I\left( {x, y - 1} \right),I\left( {x,y} \right)} \right) - f\left( {I\left( {x,y + 2} \right),I\left( {x, y + 1} \right),I\left( {x,y} \right)} \right)} \right)^{2} \), \( I\left( {x, y} \right) \) is the grey level intensity of pixel \( \left( {x, y} \right) \), \( f\left( {x, y, z} \right) \) is find the median value of \( x, y \) and \( z. \) \( M \) and \( N \) are the width and height of the image. \( F_{FSWM} \) represents the evaluation value, and the value of \( F_{FSWM} \) is larger, the position of the image is closer to the focal plane.

  1. (2)

    High-pass filter image absolute central moment (\( Hpacm \)) (Permana et al. 2016), this is a statistically algorithm, which calculates the frequency of each pixel value appearing as

    $$ F_{Hpacm} = \mathop \sum \limits_{i = 0}^{255} \left| {i - mean} \right| \times p_{i} $$
    (9)

where \( mean = \frac{{\mathop \sum \nolimits_{M} \mathop \sum \nolimits_{N} \left( {I\left( {x, y} \right)} \right)}}{M \times N} \) and \( p_{i} = \frac{h\left( i \right)}{M*N} \) is frequency of a pixel with intensity \( i \), \( Hpacm \) represents the evaluation value, and the value of \( F_{Hpacm} \) is larger, the position of the image is closer to the focal plane.

  1. (3)

    Energy Laplace (Geusebroek et al. 2000), This algorithm convolves the image with the mask as

    $$ {\text{h}} = \left[ {\left. {\begin{array}{*{20}c} 1 & 4 & 1 \\ 4 & { - 20} & 4 \\ 1 & 4 & 1 \\ \end{array} } \right]} \right. $$

to compute the second derivative \( C\left( {x,y} \right) \). The final output is the sum of the squares of the convolution results.

$$ F_{Energy} = \mathop \sum \limits_{M} \mathop \sum \limits_{N} C\left( {x,y} \right)^{2} $$
(10)

where \( F_{Energy} \) represents the evaluation value, and the value of \( F_{Energy} \) is larger, the position of the image is closer to the focal plane.

  1. (4)

    Sum-Modulus-Difference (\( SMD \)) (Geusebroek et al. 2000), this algorithm calculates the sum of the absolute value of the difference of the grey levels between two consecutive pixels in the horizontal and vertical directions as

    $$ F_{SMD} = SMD_{x} + SMD_{y} $$
    (11)

where \( SMD_{x} = \mathop \sum \nolimits_{x} \mathop \sum \nolimits_{y} \left| {I\left( {x, y} \right) - I\left( {x, y - 1} \right)} \right| \) and \( SMD_{y} = \mathop \sum \nolimits_{x} \mathop \sum \nolimits_{y} \left| {I\left( {x, y} \right) - I\left( {x + 1, y} \right)} \right| \), \( SMD \) represents the evaluation value, and the value of \( SMD \) is greater, the position of the image is closer to the focal plane.

  1. (5)

    AutoCorrelation (Wang et al. 2007; Lu et al. 2011), this algorithm calculates the correlation of grey level between adjacent pixel as

    $$ F_{AutoCorr} = \mathop \sum \limits_{M} \mathop \sum \limits_{N} I\left( {x, y} \right) \times I\left( {x + 1, y} \right) - I\left( {x, y} \right) \times I\left( {x + 2, y} \right) $$
    (12)

where \( F \) represents the evaluation value, and the value of \( F \) is greater, the position of the image is closer to the focal plane.

  1. (6)

    Tenenbaum gradient (Tenengrad) (Mattos et al. 2009; Nakajima et al. 2017), this algorithm convolves an image with Sobel operators, and then sums the square of the gradient vector components as

    $$ F_{Tenengrad} = \mathop \sum \limits_{M} \mathop \sum \limits_{N} S_{x} \left( {x, y} \right)^{2} + S_{y} \left( {x, y} \right)^{2} $$
    (13)
  2. (7)

    Brenner gradient (Desai et al. 2007): This algorithm computes the first difference between a pixel and its neighbour with a horizontal/vertical distance of two pixels as

    $$ F_{Brenner} = \mathop \sum \limits_{M} \mathop \sum \limits_{N} \left( {I\left( {x + 2, y} \right) - I\left( {x, y} \right)} \right)^{2} $$
    (14)

where \( \left( {I\left( {x + 2, y} \right) - I\left( {x, y} \right)} \right)^{2} \ge \theta \).

  1. (8)

    Variance (Lu et al. 2011), This algorithm computes variations in grey level among image pixels. It used the power function to amplify larger differences from the mean intensity µ instead of simply amplifying high intensity values as

    $$ F_{variance} = \frac{{\mathop \sum \nolimits_{M} \mathop \sum \nolimits_{N} \left( {I\left( {x, y} \right) - \mu } \right)^{2} }}{M \times N} $$
    (15)

The existing auto-focusing algorithms assume the object to be focused unmovable. As a result, these algorithms cannot be applied to auto-focusing the microgripper jaws since small movement and vibrations coming from structural compliance and outside vibrations will lead to apparent movements in the micro-FOV. To solve this problem, for every steps of movement of the jaws, the template matching is conducted first, and the matching rectangular area will act as the auto-focusing image, as shown in Fig. 7, in which the red rectangle is the template matching area. In this manner, the relative position of the microgripper jaws will always remain unchanged since one common template is used.

In order to evaluate the performance of the existing auto-focusing algorithms, first, we move the microgripper jaws manually downward to the clearest position. The position is judged by the operator. Next, we move the microgripper jaws both upward and downwards at a step length of 20 μm. It can be seen clearly that from the micro-FOV that, the microgripper jaws at the two positions become obscure. Then, 75 images are collected for upward and downward movements respectively at each step. That is, the distance of 1500 μm at each side. The values of sharpness for every image are calculated based on the existing auto-focusing algorithms, see Fig. 8.

Fig. 8
figure 8

Evaluation of different auto-focusing algorithms. Along the optical axis of the microscope, with the jaws reaching directly above the cell as the starting point, the distance between the jaws and the starting point when moving downwards is the abscissa, and the sharpness evaluation value is the ordinate

For convenience of comparison, the sharpness values calculated by all functions are normalized to the range of 0–1. It can be seen from Fig. 8 that, the curves corresponding to \( FSWM \) algorithm, \( Hpacm \) algorithm and \( AutoCorr \) algorithm are more continuous and smoother. The execution time of the algorithms is crucial for real-time micro-manipulation. Table 1 shows the time it takes for implementation of the eight algorithms. It is seen that, the execution time of \( Hpacm \) algorithm, \( SMD \) algorithm, and \( Variance \) algorithm is the shortest.

Table 1 Algorithm Execution Time (unit: s)

We use the number of local maximum points in the curve to evaluate the monotonicity of the sharpness evaluation functions. The fewer local maximum points the functions have, the better their monotonicity. Table 2 counts the number of local maximum points for different evaluation algorithm curves. The statistical range is shown in Fig. 8. The number of local maximum points is from 1500 to 1830. (The range is between the two black vertical lines in Fig. 8). Among them, the algorithms of \( FSWM \), \( Hpacm \), \( AutoCorr \), \( Tenengrad \), and \( Brenner \) have only one local maximum point, which is the global maximum value. Therefore, these algorithms have excellent monotonicity.

Table 2 Number of Local Maximum points

We use the absolute value of the difference between the calculated sharpness value and the reference position as the index for evaluating the unbiasedness of the algorithm. The smaller the difference is, the more accurate the sharpness evaluation value is. As shown in Fig. 8, the image corresponding to the 1680 on the horizontal axis is called the reference image, and the clamp is now located on the focal plane where the cell center is. Table 3 counts the position difference values corresponding to the eight algorithms. It is found that the energy algorithm, \( Hpacm \) algorithm, \( AutoCorr \) algorithm, and \( Variance \) algorithm correspond to the smallest position difference values. Therefore, these algorithms have relatively good unbiasedness.

Table 3 Position Deviation Absolute Value

From the above analysis and comparison, it can be seen that the \( Hpacm \) algorithm has the best performance in terms of execution time, monotonicity and unbiasedness. Therefore, this paper chooses the \( Hpacm \) algorithm as the sharpness evaluation algorithm which is marked by the red asterisk in Fig. 8.

5 Experimental studies

Up to 100 experiments are conducted to validate the macro-to-micro searching strategy and the grid detection algorithm. The experimental results are shown in Table 4. For all the experiments, the microgripper is automatically moved into the micro-FOV with the success rate of 100%. The average completion time is 30 s. The shortest and the longest completion time are 18 s and 40 s, respectively. In contrast, the average operating time for a skilled operator is around 100 s. In other words, the average operating time required to complete an experiment based on the automated control system implemented by the proposed scheme is only one-third of that of humans. However, when the droplet of the liquid is large, the matching accuracy of the microscope lens is low, and the matching area can be far away from the microscope lens. As the result, the microgripper requires a longer path to move into the micro-FOV and takes longer time.

Table 4 Macro-micro Operation Experiments

We also have conducted 100 experiments to validate the Auto-Focusing algorithm proposed in this paper. The experimental results are shown in Table 5, the success rate of cell immobilization experiments is 90%, and the average completion time is 25 s. However, the success rate of manual immobilization by experimental technicians is 81%, and the average completion time is 63 s. For failed experiments, the main reason is that when the jaws of the microgripper is aligned with the center plane of the embryo cell, the upward and downward movements of the jaws can lead to vibrations of the microgripper jaws. This vibration will be transmitted to the culture medium, and the embryo may move away.

Table 5 Cell Immobilization Experiments

It is proved that only when the diameter of the cell is bigger than the diameter of the inner circle of the gripper jaws at closed position, the microgripper can firmly grip the cell. The radius of the inner envelope circle of the jaws, the cell radius and the clampable field can form a right triangle as shown in Fig. 9. The clampable field of the jaws is calculated to be

Fig. 9
figure 9

Schematic diagram of clamping area (r represents the radius of the inner envelope circle of jaw, R represents the radius of embryo cell. h represents the length of the upper half of the cell that can be clamped. H represents the total length that the cells can be clamped)

$$ H = 2 \times \sqrt {R^{2} - r^{2} } $$
(10)

When the jaws of the microgripper are completely closed, the diameter of the inner envelope circle is 400 μm, and the diameter of the zebrafish embryo is around 500 μm. The clampable field of the jaws is calculated to be 300 μm.

6 Conclusion

In this paper, macro-to-micro positioning and auto-focusing algorithms and methods have been developed to automate the preparing process, including positioning all the microtools and the cell into micro-FOV and being focused, for fully automated single cell microinjection. By adding a macro camera, a macro-FOV is created and the microtools, the cell, and the objective lens can be recognized. Following a serpentine path and a zigzag path respectively, the microgripper jaws can be positioned into the micro-FOV. The presence of the microgripper jaws is recognized by the proposed grid detection method. The contact of the gripper jaws and the liquid is detected by variation of the value of grey level of the image. When the gripper jaws have been fully immerged in the liquid, eight auto focusing algorithms are tested and evaluated. Finally, 100 experiments have been conducted and 100% and 90% success rates for macro-to-micro positioning and auto focusing algorithms are achieved. The time for the whole process is only one-third that of the manual operation.