1 Introduction

The ability of human vision is excellent and accurate in object recognition, but certain limitations exist since a human being may become exhausted and unable to work as fast and efficiently as a computer [1]. Due to these limitations, computer vision is required to assist in image recognition. Object matching refers to detecting and recognizing the pose (position, rotational angle, and scale difference) of the target object in digital image form. The problem of object registration information is that, respectively, comparing the model image to the target image when they are not identical. Matching objects between two images with different appearances due to situations such as varying intensity of light, scale change, rotation of the object, and partial distortion or occlusion of the object is a challenging task.

There are many researches on object retrieval approaches and methods. Some researches have focused on complex image scenes like crowded places and many buildings in a city using algorithms based on obtaining certain points (features) from an image pyramid, in which the features are robust. Lowe [2] suggested scale-invariant feature transform (SIFT) algorithms, while Ker et al. [3] improved the SIFT algorithm by implementing principal components analysis to the normalized gradient patch, and Mikolajczyk et al. [4] proposed an extension of the SIFT descriptor named the gradient location and orientation histogram. Herbert Bay [5] proposed speeded up robust features (SURF) which is inspired from SIFT algorithm. Other studies have been performed on object recognition through the shape-based retrieval approach [6, 7]. Ling et al. [8] suggested the inner distance shape context which is an extension of shape contexts by comparing their similarity for corresponding points [9], and the inner distance is defined as the shortest path along the edge of an object shape and results in invariant shape articulation. Furthermore, Yang et al. [10] computed affine geometric invariants using convex hulls generated by an object to find the correspondences between the convex hull vertices, while Wang et al. [11] suggested that the nodes and edges of an object can be used to form a histogram descriptor for matching, and Caetano et al. [12] proposed graphical matching that works in Euclidean space to solve weighted graph matching problems.

Moreover, a method called the Fourier shape descriptor that improved the retrieval accuracy by a Fourier transform of object information has been suggested [13,14,15], and Foroosh et al. [16] proposed an extension of this method of phase correlation to sub-pixel registration by taking the means of the phase correlation on down-sampled images. Besides, Guest et al. [17] introduced a method called correspondence by sensitivity to movement which works by selecting features according to their reliability of possible matches in two and three dimensions in medical and biological applications, while Montesinos et al. [18] proposed using a first-order differential descriptor of the image function in the neighborhood of the detected control points. A comparison of the basic mutual information registration with multi-resolution contests from coarse-to-fine speed up was introduced by Pluim et al. [19], and Suk et al. [20, 21] used the invariant shape descriptor to represent regions and further developed this method to incorporate point-based invariants. An approach using 3D depth sensor was also proposed [22,23,24].

Matching can be performed by an image segmentation method. To this end, Zhang and Ji [25] proposed image segmentation using a Bayesian network for object detection, and Ferrari et al. [26] suggested contour segments for object detection. Furthermore, dynamic programming was used for distorted and occluded object retrieval by Petrakis [27] in which the task of transform model estimation was solved in the geometric deformation of the target image. A method of image acquisition to the required accuracy of registration for rigid-body point-based registration was introduced by Fitzpatrick et al. [28], while Bentoutou et al. [29] used mutually shifted and blurred digital subtraction angiograph images for registration.

However, these previous methods can be at a disadvantage for simple images like industrial component recognition where the background is mono-color and the object is simple as executing them is too computationally slow for industrial situations where speed is one of the most important criteria, and previous works have not satisfied this requirement. A mono-color background and simple object may indicate insufficient information that results in less salient feature points, which causes problems when the objects are occluded since this relationship can be inadequate due to a lack of feature points, thus leading to object mismatching. The patents of Matrox [30] and Cognex [31] are verified tools in simple object recognition. However these tools have accuracy and precision problems specifically in solving occlusion and scale changed images. 

In this paper, we address the challenge of improving the efficiency and reliability of object matching in image processing. The process is divided into three sections, namely edge detection, feature extraction, and object matching, as shown in Fig. 1. Edge detection of objects is applied to input images at a low level of abstraction with its purpose being to reduce undesired outliers and enhance the useful image data important for further processing. The feature extraction process is an important procedure in the object matching system, in which the features contain unique, relevant information relating to the model and target objects. The geometry-based vector mapping descriptor (VMD) for pattern matching is proposed for object matching, and one for each feature is developed in order to obtain corresponding feature points between the model and target images. Object matching is performed by the constructed descriptor by finding corresponding feature points that cope with both distortion and occlusion, and the method is invariant to geometric transformation.

Fig. 1
figure 1

Overview of entire system for geometric matching process

The rest of this paper is organized as follows. In Sect. 2, we introduce edge detection from edge enhancement using image sub-pixeling. Section 3 contains information on what kinds of geometric features can be detected, and in Sect. 4, we present the VMD algorithm to test the match between model and target images. In Sect. 5, we report on experiments conducted with three different cases of image datasets and a real-time camera, and Sect. 6 contains a discussion on the results of these, our conclusions on the proposed algorithm, and future work on what is left to do and on how to improve our system.

2 Edge enhancement using image sub-pixeling

Geometric features are obtained from the boundaries of an object; thus, we need edge enhancement using sub-pixeling to obtain accurate edge information. Edge detection is divided into two steps. First, image filtering is necessary and is applied to the raw image to reduce any undesired effects, and then, the edge where the geometric features are defined is extracted. In this step, sub-pixel units are applied for accurate feature information. The second step is edge linking and thinning whereby the pixel sequence of the edge and a single layer of edge pixels are required for further processing in edge enhancement as a smooth edge is required for geometric feature extraction performed at a later stage.

2.1 Edge extracting by sub-pixeling

The raw data from an input image are obtained from the camera and other datasets and usually contain noise that interferes with edge detection. The noise in an image can cause strong intensity differences considered as edges, and so a Gaussian filter is applied to eliminate it [32]. Light reflection can also cause deformation of an image that interferes with edge detection; thus, an intensity histogram is created over the whole image to investigate the light reflection effect since there will be a greater difference in the intensity histogram if part of an object is reflected by light [33]. The image becomes more suitable for extracting the edge once the noise and light reflection effects have been reduced.

In image processing, edges contain significant information for object detection which allows one to distinguish between an object and the background or between objects. Edges can be detected when a change in light intensity occurs in an image, which results in a steeper gradient. There are edge detection algorithms such as the Sobel operator, Robert’s cross operator, Prewitt’s operator, the Laplacian of Gaussian, and the Canny edge detector [34], the latter currently being the most accurate edge detection algorithm [35]. However, these algorithms calculate an edge’s position information as pixel units that have limitations in accurately detecting features such as circles and lines. A sequence of pixels tends to be a straight line instead of a smooth curve, which makes detecting the information on a feature’s position less accurate.

Application of sub-pixel [36] units in image processing algorithms has been suggested to improve the accuracy of position information. In order to increase the accuracy of the edge position using a sub-pixel edge detection algorithm, the partial area effect is used to update the pixel units with the sub-pixel units. First, we obtain the edge using the Canny edge detector from the filtered raw input image, and then, the edge position information obtained and its neighbor information with the gradient are used to update the sub-pixel units.

2.2 Edge enhancement by least square error estimation

After the edges have been detected, each edge needs to be identified as a separate object, although if one edge is contained in another edge, they are considered as the same object. A group of edges is labeled as an object, and each object can contain more than one models in occlusion problem. For further processing, the edges need to be thin single lines so that the geometric feature extraction step can proceed with more accuracy [37].

Although the detected sub-pixel unit edge is more suitable than a pixel unit edge, it can be processed further to achieve a smoother edge, which improves geometric feature extraction. Figure 2 shows the result of edge enhancement, which clearly shows that the blue line is much smoother than the red line. The feature edge indicated by the red line is made up of many short lines instead of one long line since the red line zigzags, causing lots of line intersections that need to be dealt with in the geometric feature extraction section. Many features cause time-consuming computation, which we need to avoid. Another issue is that zigzagging causes randomness that makes it difficult to create certain rules, like descriptors. Therefore, the enhancement procedure needs to progress significantly if we want stable features.

Fig. 2
figure 2

Edge enhancement result. The red line is before and the blue line is after enhancement

To achieve this, the Kth edge (K: 1, 2 … N) of the jth point is \(E_{kj}\) and \(E_{kj} = (x_{kj} ,y_{kj} )\) where \(x_{kj} \;{\text{and}}\;y_{kj}\) are the respective x and y coordinates of the point. If the Kth edge has N points, line fitting and circle fitting are carried out for the (2n + 1) edge points, [Ek(j − n), Ek(j − n + 1), …, Ek(j + n)], in the interval between j = (n, N − n).

Figure 3 shows the results of the line fitting. As can be seen, there are dots scattered along the red line. Initially, we only have the scattered dots and we have to find the line of best fit from them whereby each dot is at a minimum distance from the red line, which means they have the least error.

Fig. 3
figure 3

An example of line fitting using the least squares method

The least squares line-fitting method is used to represent line 1 = ax + by. According to the following equation, a and b values can be approximated as

$$\left[ {\begin{array}{*{20}c} {x_{k(j - n)} } & {y_{k(j - n)} } \\ \vdots & \vdots \\ {x_{k(j + n)} } & {y_{k(j + n)} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} a \\ b \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 1 \\ \vdots \\ 1 \\ \end{array} } \right],$$
(1)
$$\left[ {\begin{array}{*{20}c} a \\ b \\ \end{array} } \right] = pinv\left( {\left[ {\begin{array}{*{20}c} {x_{k(j - n)} } & {y_{k(j - n)} } \\ \vdots & \vdots \\ {x_{k(j + n)} } & {y_{k(j + n)} } \\ \end{array} } \right]} \right)\left[ {\begin{array}{*{20}c} 1 \\ \vdots \\ 1 \\ \end{array} } \right].$$
(2)

The line fitting error \(E_{l}\) is defined as

$$E_{l} = \frac{{\sum\nolimits_{i = j - n}^{j + n} {\frac{{\left| {1 - by_{ki} - ax_{ki} } \right|}}{{\sqrt {a^{2} + b^{2} } }}} }}{2n + 1}.$$
(3)

Figure 4 shows the results of circle fitting in which the blue dots are scattered around the blue line. Initially, we only have scattered dots and we have to find the circle of best fit from them whereby each dot is at a minimum distance from the blue circle, which means they have the least error.

Fig. 4
figure 4

An example of circle fitting

The least squares circle fitting method is used to represent circle \((x - x_{c} )^{2} + (y - y_{c} )^{2} = r^{2}\). According to the following equation, a, b, and r values are approximated as

$$\left[ {\begin{array}{*{20}c} { - 2x_{k(j - n)} } &\quad { - 2y_{k(j - n)} } \\ \vdots &\quad \vdots \\ { - 2x_{k(j + n)} } &\quad { - 2y_{k(j + n)} } \\ \end{array} \begin{array}{*{20}c} 1 \\ \vdots \\ 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {x_{c} } \\ {y_{c} } \\ c \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} { - x_{k(j - n)}^{2} } &\quad { - y_{k(j - n)} } \\ \vdots &\quad \vdots \\ { - x_{k(j + n)}^{2} } &\quad { - y_{k(j + n)}^{2} } \\ \end{array}^{2} } \right],$$
(4)
$$\left[ {\begin{array}{*{20}c} {x_{c} } \\ {y_{c} } \\ c \\ \end{array} } \right] = pinv\left( {\left[ {\begin{array}{*{20}c} { - 2x_{k(j - n)} } & { - 2y_{k(j - n)} } \\ \vdots & \vdots \\ { - 2x_{k(j + n)} } & { - 2y_{k(j + n)} } \\ \end{array} \begin{array}{*{20}c} 1 \\ \vdots \\ 1 \\ \end{array} } \right]} \right)\left[ {\begin{array}{*{20}c} { - x_{k(j - n)}^{2} } & { - y_{k(j - n)} } \\ \vdots & \vdots \\ { - x_{k(j + n)}^{2} } & { - y_{k(j + n)}^{2} } \\ \end{array}^{2} } \right],$$
(5)
$$c = x_{c}^{2} + y_{c}^{2} - r^{2} ,$$
(6)
$$r = \sqrt {x_{c}^{2} + y_{c}^{2} - c} .$$
(7)

The circle fitting error \(E_{c}\) is defined as

$$E_{c} = \frac{{\sum\nolimits_{i = j - n}^{j + n} {\left| {\sqrt {(x_{ki} - x_{c} )^{2} + (y_{ki} - y_{c} )^{2} } - r} \right|} }}{2n + 1}.$$
(8)

If the circle fitting error is larger than the line fitting error \(E_{l} < E_{c}\), Ekj belongs to the line; then, the point position is corrected as follows:

$$x_{kj}^{\prime } = \frac{{y_{kj} + x_{kj} - y_{c} }}{{2x_{c} }},\quad y_{kj}^{\prime } = \frac{{y_{kj} + x_{kj} + y_{c} }}{{2x_{c} }}.$$
(9)

Otherwise, if \(E_{l} > E_{c}\), Ekj belongs to the circle, and then, the point position is corrected as follows:

$$\begin{aligned} x_{kj}^{\prime } = & x_{c} + \frac{{(x_{kj} - x_{c} )r}}{{\sqrt {(x_{kj} - x_{c} )^{2} + (y_{kj} - y_{c} )^{2} } }}, \\ y_{kj}^{\prime } = & y_{c} + \frac{{(y_{kj} - x_{c} )r}}{{\sqrt {(x_{kj} - x_{c} )^{2} + (y_{kj} - y_{c} )^{2} } }}. \\ \end{aligned}$$
(10)

All the edges are tested with the enhancement process; then, point positions are corrected close to either the line or circle. This corrected edge is more suitable for the next step: geometric feature extraction.

3 Geometric feature extraction with circle centers and line intersections

In this section, we define the geometric features that are used for creating vector mapping descriptors (VMDs) and object matching. We defined two types of geometric features: (a) a circle and its center and (b) line segments and their intersection points. This section is divided into two parts as two main geometric features are employed. We begin by detecting the circle information, and once this step has been completed, the remaining contours are examined to determine whether they consist of any lines. The defined line segments are extended imaginarily to produce the line intersections used for salient feature points. Figure 5 shows the results of various geometric features from geometric feature extraction.

Fig. 5
figure 5

Results of geometric feature extraction. The yellow dots represent the circle centers detected by the least squares circle fitting algorithm, the green dots represent line intersections, and the red line represents detected line segments from the least squares line-fitting algorithm

3.1 Circle feature detection

The input contour contains information about the sequence of edges. One object has k contours \(\left\{ {C_{1} , C_{2} , C_{3} , \ldots ,C_{k} } \right\}\), and contour \(C_{i}\) has n edge points \(\left\{ {C_{i} \left( {x_{1} ,y_{1} } \right), C_{i} \left( {x_{2} ,y_{2} } \right), C_{i} \left( {x_{3} ,y_{3} } \right), \ldots , C_{i} \left( {x_{n} ,y_{n} } \right)} \right\}\). The least squares circle fitting method (4) is used to represent circle \((x - x_{c} )^{2} + (y - y_{c} )^{2} = r^{2}\) along the contours [38]. According to the following equation, a, b, and c values are approximated where \(c = r^{2} - x_{c}^{2} - y_{c}^{2}\). If a circle-like region is defined, it needs to be confirmed at a certain threshold with defined parameter

$$\left\{ {\begin{array}{*{20}l} {{\text{Circle}}, } \hfill & {\mathop \sum \limits_{j = 1}^{n} r_{j} > {\text{Circle}}\_{\text{Error}}\_{\text{Threshold}}} \hfill \\ {\text{Not circle, }} \hfill & {\text{Otherwise}} \hfill \\ \end{array} } \right.$$

Hence, there are two types of regions: circular and non-circular, and through this process, we obtain information on the circle radii and centers in the defined regions.

3.2 Line feature detection

3.2.1 Line-like region

Once identification of circles from the object has been completed, the remaining contour regions are indicated as line segments. Again, the object has k remaining contours \(\left\{ {C_{1} , C_{2} , C_{3} , \ldots ,C_{k} } \right\}\), and contour Ci has n edge points \(\left\{ {C_{i} \left( {x_{1} ,y_{1} } \right), C_{i} \left( {x_{2} ,y_{2} } \right), C_{i} \left( {x_{3} ,y_{3} } \right), \ldots ,C_{i} \left( {x_{n} ,y_{n} } \right)} \right\}\). The least squares line-fitting algorithm in Eq. (1) is used to detect line segments. First, one point Ci(xp,yp) is selected from the remaining contours as the starting point for line segments if the point and its sequence of neighboring points {Ci(xp+1,yp+1), Ci(xp+2,yp+2), …, Ci(xp+q,yp+q)} construct line Li using Eq. (1). p represents an arbitrary point that is usually the first point of the contour, and q represents the initial number of neighboring points set by the user. The line segments need to be formed by satisfying Eq. (1) within the condition of an initial number of neighboring points q. Once the starting point Ci(xp,yp) is established, the initial line Li is formed and its seed starts to grow: \(\left\{ {C_{i} \left( {x_{p + 1} ,y_{p + 1} } \right),C_{i} \left( {x_{p + 2} ,y_{p + 2} } \right), \ldots ,C_{i} \left( {x_{p + (q + 1)} ,y_{p + (q + 1)} } \right),C_{i} \left( {x_{p + (q + 2)} ,y_{p + (q + 2)} } \right), \ldots , \, C_{i} \left( {x_{p + (q + m)} ,y_{p + (q + m)} } \right)} \right\}\) until Eq. (1) is no longer satisfied. This procedure is repeated until all of the remaining contours have been inspected.

3.2.2 Line segments with least errors

Noise can cause the separation of one long line segment into several line segments, as shown in Fig. 6a. In order to detect a complete long line segment among the contours, the broken line segments need to be combined, as shown in Fig. 6b. A line segment is compared with neighboring line segments using Eq. (1) to determine whether it can be combined or left separate until detection of the line segment has completed. However, the line fitting still contains error which needs to be minimized by shortening the length of the line segments until they contain the least number of errors. The shortening process is carried out by cutting both sides of the edge pixel points \(\left\{ {C_{i} \left( {x_{p + 1} ,y_{p + 1} } \right), \, C_{i} \left( {x_{p + 2} ,y_{p + 2} } \right), \ldots , \, C_{i} \left( {x_{p + (q + m - 1)} ,y_{p + (q + m - 1)} } \right)} \right\}\) until the error has been minimized. The value of the least squares error parameters can be determined to minimize the range of cutting of the edge pixel points [39].

Fig. 6
figure 6

a Line-like regions detected by the least squares line-fitting algorithm, b line segments containing the least errors, and c line intersections

3.2.3 Line Intersections

Although we now have lines with the least errors, the shortening process causes the line segments to have different lengths, which means they have different starting and ending points and results in line segments being unable to be used as salient features as well as the vertices being unstable. Hence, we apply line intersections by creating another concept of vertex. The defined extended line segments cause the intersections, and the intersection points are robust with regard to different line lengths and thus can be used as salient feature points.

4 Constructing VMDs and the matching algorithm

VMD matching is a sufficient algorithm for transformation invariance. Distance information is invariant to rotation and angle information is invariant to scale, and so both distance and angle are selected for constructing the descriptor. In this section, we describe the three steps of our proposed geometric matching method: constructing the descriptor, one-to-one feature matching, and finding the desired point with completed object matching. First, we need a descriptor that is unique for an object, then we compare each descriptor from one feature to the other features, and lastly, the position of the objects in the target are calculated according to the user setting of points for the object in the model.

4.1 Constructing VMDs

The VMD contains unique information about the object and the uniqueness of each VMD is used to recognize the objects in different scenes. In this case, the target objects could be single or multi-target or could be occluded. Our descriptors are created from geometrical features F defined at the previous section. The relationships between the defined features are represented as vectors which can be separated to form distance descriptors and angle descriptors. Figure 7 represents feature number 1 in relation to the other feature vectors that contain distance and angle. Likewise, the rest of the features from numbers 2 to 7 form the VMD. The vector of feature number 1 itself is 0, and so all the descriptors contain vector 0 for each feature.

Fig. 7
figure 7

The vector mapping descriptor of feature number 1

A vector between two feature points can be represented as follows:

$$\overrightarrow {v}_{i} = F_{i} - F_{j} ,$$
(11)

where \(i \in (1, \ldots ,n)\) and \(j \in (1, \ldots ,n)\), in which \(n\) represents the number of features.

The Euclidean distance between corresponding feature points \(F_{i}\) and \(F_{j}\) is denoted by

$$d_{ij} = \sqrt {(F_{xi} - F_{xj} )^{2} + (F_{yi} - F_{yj} )^{2} } ,$$
(12)

and its distance descriptor is defined as

$$D = \left[ {{\mathbf{d}}_{1} ,{\mathbf{d}}_{2} , \ldots ,{\mathbf{d}}_{n} } \right],\quad {\text{where}}\quad {\mathbf{d}}_{1} = \left[ {d_{11} ,d_{12} , \ldots ,d_{1n} } \right].$$
(13)

The angle between corresponding feature points \(F_{i}\) and \(F_{j}\) can be calculated as

$$\theta_{ij} = \tan^{ - 1} \frac{{F_{yi} - F_{yj} }}{{F_{xi} - F_{xj} }}.$$
(14)

and its angle descriptor is defined as

$$\varTheta = \left[ {{\varvec{\uptheta}}_{1} ,{\varvec{\uptheta}}_{2} , \ldots ,{\varvec{\uptheta}}_{n} } \right],\quad {\text{where}}\quad {\varvec{\uptheta}}_{1} = \left[ {\theta_{11} ,\theta_{12} , \ldots ,\theta_{1n} } \right].$$
(15)

4.2 One-to-one Feature Matching using VMD

Object matching indicates that one feature in an object can only have one corresponding feature in another object. Thus, the corresponding feature set should only contain unique conformity between features. Figure 8 shows the VMD of one feature point in which the circle segments represent the equality of distance and the bearing represents the angle.

Fig. 8
figure 8

Representation at one feature point for a model VMD (a) and a target VMD (b)

To achieve one-to-one feature matching, the object descriptor in Eqs. (13) and (15) is divided into feature descriptor distance \(D_{i}\) and feature descriptor angle \(\varTheta_{i}\) and thus

$$D_{i} = \left[ {{\mathbf{d}}_{i} , \ldots ,{\mathbf{d}}_{n} } \right],\quad \varTheta_{i} = \left[ {{\varvec{\uptheta}}_{i} , \ldots ,{\varvec{\uptheta}}_{n} } \right] .$$
(16)

The model image \(I_{M}\) has distance descriptor \(D_{M}\) and angle descriptor \(\varTheta_{M}\), and similarly, the target image \(I_{T}\) also has distance descriptor \(D_{T}\) and angle descriptor \(\varTheta_{T}\). Hence,

$$D_{{M_{i} }} = \left[ {d_{{M_{i1} }} \ldots d_{{M_{in} }} } \right],\quad \varTheta_{{M_{i} }} = \left[ {\theta_{{M_{i1} }} \ldots \theta_{{M_{in} }} } \right],$$
(17)
$$D_{{T_{i} }} = \left[ {d_{{T_{i1} }} \ldots d_{{T_{im} }} } \right],\quad \varTheta_{{T_{i} }} = \left[ {\theta_{{T_{i1} }} \ldots \theta_{{T_{im} }} } \right],$$
(18)

where \(i \in (1, \ldots ,m)\) and \(j \in (1, \ldots ,m)\), and \(n\) and \(m\) represent the number of features in the model and target object, respectively.

The rotation and scaling relationship for arbitrary feature F from image I to feature \(\tilde{F}\) in image \(\tilde{I}\) are, respectively, denoted by

$$\sigma_{i} = d_{{M_{ia} }} /d_{{T_{ib} }} ,$$
(19)
$$\theta_{i} = \theta_{{M_{ia} }} - \theta_{{T_{ib} }} .$$
(20)

Once the arbitrary scale factor and rotation angle are calculated, it is assumed that the transformed image \(\tilde{I}\) is enlarged by \(\sigma_{i}\) and rotated by \(\theta_{i}\), and then

$$D_{{M_{i} }} = \sigma_{i} \left[ {d_{{T_{i1} }} \ldots d_{{T_{im} }} } \right],\quad \varTheta_{{M_{i} }} = \theta_{i} + \left[ {\theta_{{T_{i1} }} \ldots \theta_{{T_{im} }} } \right].$$
(21)
$$E = \sqrt {\left( {D_{M} - \sigma_{i} D_{T} } \right)^{2} + \left( {\varTheta_{M} - (\theta_{i} + \varTheta_{T} )} \right)^{2} } .$$
(22)

The steps in the VMD algorithm are shown in Fig. 9. If arbitrary features correspond to each other, the feature descriptor distance and angle are similar to the enlarged and rotated feature descriptor distance and angle. Therefore, we calculate the least error E for pairwise corresponding feature points after which the counter in the accumulator increases and the process continues for the remaining elements in the feature descriptor. Otherwise, we calculate the scale factor and rotation angle for the remaining elements in the feature descriptor. The process repeats until all the elements in one feature descriptor have participated in the calculation, and then until all of the feature descriptors have been compared. Every iteration leads to one-to-one feature point matching. In addition, we define the type of geometric features as detailed in Sect. 3, and the matching calculation is only performed with the same type of geometric feature; thus, we can reduce unnecessary iterations, which reduces computational time. Figure 9b shows the true matches for four features relations out of six possible ones, and Fig. 9a, c, and d shows an example of false corresponding features where none of them match.

Fig. 9
figure 9

Arbitrary one-to-one feature matching: a the true corresponding features, b correct matching by scale factor and rotational angle, c false corresponding features, and d incorrect matching by scale factor and rotational angle

4.3 Completed object matching and desired detection point

The geometric matching between the set of corresponding feature points in two different images is derived from the rotational angle and distance (Fig. 10). Once all sets of corresponding feature points have been obtained, a comparison between the number of feature points in the model image and the target image is made, for which the features in the model image become the standard. The desired point \(\tilde{x}_{i} ,\tilde{y}_{i}\) in the target image can be calculated from each corresponding feature point’s characteristics: the x and y direction vectors \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {v}_{xi}\) and \(\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {v}_{yi}\) from the model, rotation angle \(\Delta \theta_{i}\), and feature points \(F_{xi}\) and \(F_{yi}\) from the target. Hence, the desired point from the original image can be detected in transformed image by

$$\begin{aligned} \tilde{x}_{i} = \cos \left( {\theta_{i} } \right) *\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {v}_{xi} - \sin \left( {\theta_{i} } \right) *\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {v}_{yi} + F_{xi} , \hfill \\ \tilde{y}_{i} = \sin \left( {\theta_{i} } \right) *\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {v}_{xi} + \cos \left( {\theta_{i} } \right) *\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\rightharpoonup}$}} {v}_{yi} + F_{yi} . \hfill \\ \end{aligned}$$
(23)
Fig. 10
figure 10

The VMD algorithm

5 Experiments

The experiments were carried out in the following situations: clear objects on the image, deformed object features on the image, and occluded objects on the image. These three situations were conducted on both image datasets and a real-time camera.

5.1 Experimental environment

The experimental environment was set up as an industrial inspection system in which a mono-color background and simple objects for recognition were used. The mono-color background allowed for clear separation between the background and the foreground objects since clear separation helped the edge detection method described in Sect. 2. Clear edges have a close relationship with the feature accuracy described in Sect. 3, and this resulted in the desired feature detection (circle center and line intersection). In order to keep computational time at a minimum, we limited the line intersection feature to be performed on an edge and its neighbor only. Finally, the VMD algorithm was performed to identify the relationships between geometric features.

5.2 Matching accuracy experimental results

The matching experimental process began with non-occluded inputs separated into two types: full-featured objects and deformed objects with missing or additional features. Next, occluded input was created from the deformed objects, and various scale-different objects were tested as well. Figure 11 presents the tested models.

Fig. 11
figure 11

The tested model images

5.2.1 Matching Accuracy for Non-Occluded Input

The first experiment was based on a situation where the features on the model were similar to those on the target, which infers clear object recognition. The same number of features and the location of features on the objects were very close but not identical since the different images contained noise and uncertainty. There were allowable ranges of deformed features on the target matched with the model, and missing feature points resulting in less feature information caused fewer corresponding features in perhaps different locations. Figure 12 shows that an object in the model image has symmetric features located from the circle center where one of the objects in the target image has different feature locations (two features located on its left leg and missing two features at the top right). This deformation was due to the noise that each image contained and different uncertainty, which led to interference in the feature detection process. Figure 12 shows that the matching process still performed well regardless of the situation, and so we were able to discover the desired points. According to the matching process (Fig. 10), the model and target objects were defined as identical, and so the user-desired points were detected, as shown in Fig. 13 (Table 1).

Fig. 12
figure 12

Finding the desired point using the vector from each corresponding feature (the model and target have different feature points, target has two missing feature point on its right shoulder and two unwanted feature points on its left leg)

Fig. 13
figure 13

Non-occluded object results

Table 1 Comparison of the three methods for a non-occluded object

The proposed algorithm had the highest intersection over union (IoU) rate with the lowest standard deviation (SD), as shown in Table 1 compared to the other methods. Even though the Matrox Imaging Library (MIL) [30] geometric model finder is a strong tool, the proposed algorithm showed an improvement of 1.72% in IoU rate over it (Fig. 14). The speeded up robust features (SURF) algorithm had the lowest IoU rate, which is as expected from its limitations. The lower SD of the proposed method also indicates that it tended to be closer to the mean registration results (Table 2).

Fig. 14
figure 14

An example of registration results for a non-occluded object: a proposed, b MIL, and c SURF

Table 2 An example of registration results of the three methods for a non-occluded object

5.2.2 Matching accuracy for occluded input

The last experiment was conducted in a situation where the objects were overlapping and within the allowable range of feature detection, which is an extension of the second experiment. If the objects overlap, desired features might be missing and unwanted features might result from a high level of deformation. Another difference from the second experiment was that multi-object recognition was carried out instead of single-object recognition.

In this case, multiple detection is required in the region of interest. Initially, we needed to check whether the occlusion was true or not, and we used a parameter for the maximum and minimum scale factor because this needs to be reasonable in an industrial inspection system. Therefore, the maximum distance in the model by applying the parameter was compared to the distance in the target. This can be used to distinguish between multiple and single objects by whether the target distance is within the maximum distance of the model. As shown in Fig. 15, the maximum distance between model features was used as the standard for distinguishing between multiple objects. The group of features a, b, c, d, e, f, and j were candidates according to the maximum distance (A–E) in the model, while the group of features d, e, f, g, h, and i were other candidates. Thus, we were able to define that the target possibly contained two objects (Fig. 15).

Fig. 15
figure 15

Distinguishing multiple objects. The model definition is on the left and the occluded target on the right

If occlusion exists, the VMD of the model object and objects on target objects are compared until nothing left to compare. If two objects overlap, then the first iteration of the comparing descriptor will ascertain the one object and the other object will be matched during the second iteration. If more than two objects are occluded, the same procedure is executed until there is nothing left to compare. Therefore, our method can handle occlusion, as shown in Fig. 16 and 17.

Fig. 16
figure 16

Occluded object results

Fig. 17
figure 17

An example of registration results for an occluded object: a proposed, b MIL, and c SURF

Table 3 exhibits that the proposed algorithm had the highest IoU rate although the object was occluded. However, the occlusion caused a lower IoU rate than the non-occlusion results due to the loss in image data (Fig. 17), which led to a higher SD as well. The proposed algorithm still outperformed the others in both accuracy and precision (Table 4).

Table 3 Comparison of the three methods for an occluded object
Table 4 An example of registration results of the three methods for an occluded object

5.2.3 Matching accuracy at various scale levels

This experiment was conducted to determine the effect of varying scale on object recognition and registration. Changes of scale may result in different positions of features. However, their distance ratio is proportional to original model, and the advantage of using angles is that they are scale invariant. Therefore, we used angles to determine the correlation of features and distance to discover the scale factor (Figs. 18, 19). We tried both scaling up and down of the model images with the following different scale levels: ± 10, ± 20, and ± 30%. Recognition still succeeded at ± 10 and ± 20% applied to the model image before recognition and registration, and the scale level at ± 20% had a higher IoU rate than at ± 10%. Furthermore, our method was successful at ± 30% scale level of the model image with a higher IoU rate than the other scale levels (Tables 5, 6, and 7).

Fig. 18
figure 18

Results of the objects at various scale levels

Fig. 19
figure 19

Example of the registration results for a scale-altered object: a proposed, b MIL, and c SURF

Table 5 IoU rates comparison of the three methods at various scale levels
Table 6 SD comparison of the three methods at various scale levels
Table 7 Example of registration results for the three methods for a scale-altered object

The proposed algorithm had the highest IoU rate at 3.75% higher than MIL and 42.13% higher than SURF, and once again, showed the lowest SD among the three algorithms. The scale difference of ± 20% incurred the highest IoU rate, indicating that image quality was more significant than ratio of scale. Therefore, the proposed algorithm outperformed the other methods in both accuracy and precision with images at various scale levels.

6 Conclusions

Image matching and registration is the foundation for many computer vision systems and their application to areas such as navigation and security surveillance by recognition of desired objects. In this study, we started from edge detection to demonstrate how to obtain fine images from coarse images, resulting in identifying robust geometric features for creating the VMD algorithm proposed to solve the object recognition problem for which the parameter for occlusion percentage can be altered. If this parameter is set to be high, the method can handle this situation, but the accuracy decreases, and vice versa. Based on the results, we showed that the proposed matching method was invariant to arbitrary geometric transformation: translation, rotation, scale change, extra or missing feature points, and occlusion. The computational speed of the matching algorithm is also fast enough for an industrial recognition system. Moreover, the VMD algorithm itself can perform in any case where the features of an object in the model and target are similar. The experimental results show that the proposed algorithm outperformed two previously reported methods in both accuracy and precision.

In future work, we will improve on the performance of VMD algorithm by finding the optimal parameters for the recognition system. Our research will focus on 3D datasets with and without occlusion, in which the distance and angle descriptors extended to the x, y, and z axes can be part of the solution. Another area of interest is representing 3D images as 2D images by projection onto a plane.