1 Introduction

Point set registration [1,2,3,4] is a fundamental topic and plays a significant role in machine learning and data mining, and it frequently arises in science tasks, such as augmented reality [5, 6], image mosaicking [7, 8], pattern recognition [9,10,11], and image fusion [12]. These feature-based image registration tasks work with point features extracted from image pairs, where such a point feature implies some geometric invariance between the images. Image registration can thereby be reduced to a feature point-based registration problem that involves both estimating reliable correspondences and recovering the optimal transformation parameters.

Based on the type of spatial transformations considered, point set registration problems can be divided into problems of rigid registration and nonrigid registration. Rigid registration usually involves rotation, translation, and scaling, and excellent performance has been achieved for this type of registration (e.g., [13,14,15]). By contrast, nonrigid registration addresses data with various types of degradation (deformation, noise, occlusion, and outliers). It is more challenging because the correspondences between the two point sets are more difficult to establish, and the underlying nonrigid spatial transformations are complicated. However, nonrigid registration arises in many practical engineering, including criminal detection based on palmprint recognition [16], facial expression recognition [17], visual navigation [18], etc. In this paper, we focus on nonrigid registration problem.

A primary challenge in nonrigid registration is how to construct the initial correspondences from the degraded point sets. Here, reliable correspondences are used to refine the optimal spatial transformation, and vice versa. The initial correspondences are usually established based on only local descriptors, e.g., SIFT [19] or shape context (SC) [20], which are inherently sensitive to degradation and hence will give rise to some uncertain correspondences. In brief, the initial correspondences are contaminated by outliers (incorrect correspondences). The goal of transformation estimation is to partition the initial correspondences into inliers (correct correspondences) and outliers in accordance with the data points’ similarities. However, the existence of outliers makes the registration problem more difficult because spatial transformation learning is a nonlinear registration problem that cannot be modeled in a simple parametric manner. Thus, additional constraints are needed to preserve the global structure and intrinsic geometric structure of the point sets during registration.

To address the above problems, we propose an innovative method for nonrigid point set registration in which the registration problem is converted into a Gaussian mixture model (GMM)-based probability density estimation problem. This method is specifically designed to find an optimal alignment between two given point sets. More precisely, we first construct the initial correspondences using local descriptors. Then, we develop a flexible and robust registration framework for distinguishing inliers from outliers. The key idea of the framework is to employ two complementary constraints in a unified GMM-based optimization framework. One is a thin-plate spline (TPS)-based regularization constraint, which is used to maintain global smoothness during the estimation of the underlying transformation. The other constraint is an intrinsic structure-based geometrical constraint, which is applied to preserve the intrinsic geometry of the transformed set. A joint iterative optimization process that links these two constraints under a GMM-based framework is achieved by using the expectation maximization (EM) [21] algorithm to automatically identify inliers. Moreover, the TPS approach [22] is used to parameterize the spatial transformation, the spline tool TPS has a close-form solution which can be decomposed into affine and nonaffine subspaces. It can produces a smooth functional mapping in the context of supervised learning and has no free parameters that need manual tuning. To generate a smooth spatial mapping between two given point sets with one-to-one correspondences, we choose the TPS for parameterization.

The remainder of this paper is organized as follows. Section 2 reviews related works and the problem we address. Section 3 describes our method in detail. Section 4 presents experimental results and performance comparisons between our approach and other approaches, followed by a conclusion in Sect. 5.

2 Related work

Feature-based image registration is a classical problem that has attracted considerable attention and has been under development for a long time. As mentioned earlier, a feature point-based approach is typically applied in which a local descriptor is used to construct initial corresponding point pairs, and the underlying transformation is then solved for to remove outliers. Here, several theories will be briefly reviewed, and our contributions will be mentioned afterward.

In local descriptor-based methods, a set of initial feature correspondences is built for the extracted point sets by using local neighborhood information. The commonly used descriptors include SIFT, SC, and FPFH [23], the application of which to construct feature correspondences has already been demonstrated. In addition, Hauagge and Snavely [24] presented a technique for finding correspondences between difficult image pairs that are rich in symmetries. Mikolajczyk and Schmid [25] evaluated the performance of a variety of descriptors for local regions of interest and proposed an extension of the SIFT descriptor. However, these local descriptors consider only local neighborhood information, and the initial correspondences usually are either too sparse or contain unavoidable outliers because of ambiguity.

Feature point-based registration is used to align two point sets together and determine the inliers from the initial correspondences. One of most popular registration methods is the iterative closest point (ICP) method [13], which iteratively assigns correspondences between two surface patches under Euclidean transformation. However, as an iterative method, the ICP method itself requires coarse prealignment, without which the method may tend to become trapped in local minima. More critically, such an adequate set of initial poses is no longer valid for nonlinear registration. To alleviate the issue of local minima, various improvements and variants of the standard ICP method have been presented [26,27,28,29]. Although these methods can improve the convergence properties of the standard ICP method, they still do not achieve high robustness. Furthermore, a robust feature point-based registration framework based on modeling the transformation with thin-plate spline robust point matching (TPS-RPM) has been presented [30], in which deterministic annealing and soft-assignment techniques are used to learn the transformation. Although the TPS-RPM method is more robust than the ICP method, it has a higher computational complexity. Under this same framework, Tsin and Kanade [31] proposed another trick for point set registration. The kernel correlation (KC) method, also called multiply linked ICP, considers the proportional correlation of two kernel density functions. A later improved approach presented by Jian and Vemuri [32] is based on the KC method; it attempts to model each point set as a mixture of Gaussians and then estimate the transformation by minimizing the discrepancy between the two Gaussian mixtures. However, it is sensitive to outliers. Based on motion coherence theory (MCT) [33, 34], the coherence point drift (CPD) method [35] was introduced and later improved in [36]. In this method, the registration problem is formulated as a maximum likelihood GMM estimation problem with a regularization term for preserving the global topological structure. However, for the CPD method, it is necessary to estimate the underlying number of Gaussian components, and this method is still sensitive to outliers and noise.

Recently, several methods have been proposed for the detection of outliers through the embedding of spatial constraints and restriction of the solution space. Yang et al. [37] proposed a flexible method called global and local mixture distance with TPS (GLMDTPS), which treats global and local structural differences as a linear assignment problem. The performance of this method which consider only the geometric structure is limited by the assumption that corresponding points have similar structure. Ma et al. [38] proposed a new approach for vector field learning with application to mismatch removing. The SC descriptor was used as local descriptor to assign the membership probabilities, and the EM method was used to solve the regularized mixture model. Yang et al. [39] proposed using the standrad TPS regularization term as prior knowledge to preserve the global topology. However, this approach ignores the intrinsic geometry of the point sets. Wang et al. [40] presented a spatially constrained Gaussian fields (SCGF) method for nonrigid point set registration. They utilized the inner distance shape context (IDSC) [41] to construct the correspondences and used graph Laplacian regularization to constrain the geometric structure of the input space. However, when facing a large degree of degradation, this method is highly time consuming and can easily become trapped in a local minimum. Ma et al. [42, 43] presented an efficient approach based on manifold regularization for the nonrigid registration of shape patterns. However, their method is not robust when confronted with some degree of noise and outlier degradation. Wang et al. [44] formulated the retinal image registration problem as a probabilistic model, which can be used to learn a Gaussian field estimator to achieve robust estimation. Ge and Fan [45, 46] designed a joint optimization approach that combines the CPD constraint and a local linear embedding (LLE) constraint in a GMM-based framework to cope with highly articulated nonrigid deformations, although this method can utilize local structural features more efficiently, the extension performs poorly on data with inhomogeneous density and it more sensitive to outliers because outliers would seriously affect the descriptions of local structures. Thus, their performance degrades in complicated registration problems.

In addition, Ma et al. [47] introduced a simple method for robust feature registration for remote sensing images, called guided locality preserving matching (GLPM), which utilizes the neighborhood structures of potential inliers between the pair images to be registered. Ma et al. [48] introduced a learnable match classifier called LMR for mismatch removal. The classifier constructs match representation for putative matches based on the consensus of local neighborhood structures. Sedaghat and Mohammadi [49] proposed a robust local feature-based registration approach for high-resolution images that is based on the improved SURF detector and localized graph transformation matching. Ma et al. [50] proposed a robust feature guided GMM for image matching which can handle both rigid and nonrigid transformation robustly. Jiang et al. [51] proposed an approach called RFM-SCAN for feature matching, this paper casts the feature matching as a spatial clustering problem with outliers exploiting machine learning techniques. Lati [8] proposed an image mosaicking method for unmanned aerial vehicle images based on random sampling consensus (RANSAC) and bidirectional approaches with a fuzzy inference system.

In summary, the previously mentioned methods have made notable contributions to the field of point set registration, but few of these approaches can achieve good performance when confronted with some degree of degradation. In contrast to the above methods, our proposed method is more robust to degradation than previous methods. The major contributions of the approach proposed in this paper are listed as follows. (1) Global and intrinsic geometric constraints are jointly encoded in a GMM-based registration framework with the aim of preserving the topological structure of the image information. (2) An alternating optimization strategy is used to update the transformation parameters and efficiently eliminate outliers. (3) When this method is applied to contour images and real-world images, the experimental results demonstrate that the method can successfully handle multiple types of degradation.

3 Method

This section elaborates on our method for nonrigid point set registration, which aims to estimate reliable correspondences and learn the underlying transformation between two given point sets subject to ordering information for the model point set \( \text{X} = \{ x_{i} \}_{i = 1}^{N} \in {\mathbb{R}}^{N \times D} \) and the scene point set \( \text{Y} = \{ y_{j} \}_{j = 1}^{M} \in {\mathbb{R}}^{M \times D} \), where \( M \) and \( N \) denote the number of points in each point set, respectively, and \( D \) is the dimension of the points. The model point set is typically perturbed by deformation, noise, occlusion, and outliers; the hope is that the inliers will still show the maximum pointwise overlap with the scene set. In the context, we consider the global connectivity and intrinsic geometry between two given point sets and propose an efficient nonrigid registration method that tries to identify the inliers among initial correspondences. Specifically, we incorporate two complementary constraints into a GMM-based probabilistic framework. The first constraint is a TPS-based global motion coherence constraint, and the second constraint is a spectral graph-based constraint for intrinsic manifold structure preservation. We can solve the registration problem by minimizing the objective function deduced from the GMM-based problem formulation and these two topological constraints.

3.1 Correspondence estimation

The SC, as an effective descriptor for a point set, is a measure of the distribution of the relative positions of neighboring points. This distribution is defined as a joint histogram in which each histogram axis represents a parameter in polar coordinates [52]. The SC descriptor can be used to search for matches in both 2D and 3D cases.

Furthermore, the local relationship among the neighboring points is well preserved, thus making the matching problem tractable. Logarithmic distance and polar angle bins are used to capture the coarse location information, and these bins are uniform in log-polar space, which makes this feature descriptor more sensitive to the positions of neighborhood points than to those of points farther away [53]. By applying this strategy to establish point correspondences, we can obtain a set \( \text{S} = \{ x_{l} ,y_{l} \}_{l = 1}^{L} \) of putative correspondences between the model and scene point sets, where \( L \le \hbox{min} \left\{ {M,N} \right\} \) is the number of correspondences.

3.2 GMM-based nonrigid registration framework

The putative correspondence set \( \text{S} = \left\{ {x_{l} ,y_{l} } \right\}_{l = 1}^{L} \) will inevitably include mismatches with some unknown outliers. Let \( {\tilde{\text{X}}} = \{ x_{l} \}_{l = 1}^{L} \) and \( {\tilde{\text{Y}}} = \{ y_{l} \}_{l = 1}^{L} \). The task is to align \( {\tilde{\text{X}}} \) with \( {\tilde{\text{Y}}} \) (e.g., \( y_{l} = \text{T}(x_{l} ) \)) and identify inliers, where \( \text{T} \) is the optimal transformation function.

In this paper, we focus on identifying as many inliers as possible while achieving satisfactory precision. To this end, we consider a probabilistic mixture model for nonrigid registration. We suppose that the inliers are subject to a multivariate Gaussian distribution with equal isotropic covariances \( \sigma^{2} \text{I} \) and that the outliers satisfy a uniform distribution \( 1/a \) with a positive constant \( a \). Then, we associate a latent variable \( z_{l} \in \{ 0,1\} \) with the \( l \)-th corresponding point pair, where \( z_{l} = 1 \) for inliers and \( z_{l} = 0 \) for outliers. We use homogeneous coordinates for each point, e.g., \( x_{l} = [x_{l} |1]_{1 \times 3} \), where \( 1 \) denotes a matrix element. Thus, the joint probabilistic model can be written as:

$$ \begin{aligned} p({\tilde{\text{Y}}}|{\tilde{\text{X}}},\theta )\; = \prod\limits_{l = 1}^{L} {\;\sum\limits_{{z_{l} }} p } (y_{l} ,z_{l} |x_{l} ,\theta ) \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = \prod\limits_{l = 1}^{L} {\left[ {\frac{1 - \omega }{a} + \frac{\omega }{{2\pi \sigma^{2} }}\text{e}^{{ - \frac{{||y_{l} - \text{T}(x_{l} )||^{2} }}{{2\sigma^{2} }}}} } \right]} \\ \end{aligned} $$
(1)

where \( \theta = \{ \text{T},\omega ,\sigma^{2} \} \) is a set of unknown latent variables and \( \omega \in [0,1] \) is the mixing coefficient. In accordance with Bayes’ theorem, we consider the maximum a posteriori estimate of \( \theta \), i.e., \( \theta^{*} = \arg \max_{\theta } p({\tilde{\text{Y}}}|{\tilde{\text{X}}},\theta ) \), which can be determined by minimizing the following energy function:

$$ E(\theta ) = - p({\tilde{\text{Y}}}|{\tilde{\text{X}}},\theta ) = - \sum\limits_{l = 1}^{L} {\ln \sum\limits_{{z_{l} }} {p(y_{l} ,z_{l} |x_{l} ,\theta )} } $$
(2)

Following the EM algorithm [21], an alternating iterative strategy is considered to address the latent variables. We can find the complete-data logarithm likelihood function of (2) (E step) as follows:

$$ \begin{aligned} {\mathcal{Q}}(\theta ,\theta^{old} ) = - \frac{1}{{2\sigma^{2} }}\sum\limits_{l = 1}^{L} {p_{l} ||y_{l} - \text{T(}x_{l} \text{)}||^{2} } \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; - \ln \sigma^{2} \sum\limits_{l = 1}^{L} {p_{l} } + \ln \omega \sum\limits_{l = 1}^{L} {p_{l} } \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \ln (1 - \omega )\sum\limits_{l = 1}^{L} {(1 - p_{l} )} \\ \end{aligned} $$
(3)

where \( p_{l} = p(z_{l} = 1|x_{l} ,y_{l} ,\theta^{old} ) \) is a posterior probability indicating the degree to which \( (x_{l} ,y_{l} ) \) is an inlier. Based on Bayes’ rule, this probability can be obtained by using the old transformation variables as follows:

$$ p_{l} = \frac{{\exp \left( { - \frac{{||y_{l} - \text{T}(x_{l} )||^{2} }}{{2\sigma^{2} }}} \right)}}{{\exp \left( { - \frac{{||y_{l} - \text{T}(x_{l} )||^{2} }}{{2\sigma^{2} }}} \right) + \frac{{2\pi \sigma^{2} (1 - \omega )}}{\omega a}}} $$
(4)

Here, we define a threshold \( \varsigma \) for identifying inliers \( (p_{l} > \varsigma )_{l = 1}^{L} \) after the EM algorithm converges or reaches some termination condition.

The variables are re-estimated, \( \theta^{new} = \arg \max_{\theta } {\mathcal{Q}}(\theta ,\theta^{old} ) \), on the basis of the current \( p_{l} \). Taking the derivatives of Eq. (3) with respect to \( \sigma^{2} \) and \( \omega \) and setting them equal to zero yields the following (M step):

$$ \sigma^{2} = \frac{{\text{tr[}({\tilde{\text{Y}}} - \text{T}({\tilde{\text{X}}}))^{\text{T}} {\text{P}(\tilde{\text{Y}}} - \text{T}({\tilde{\text{X}}})\text{)]}}}{{2 \cdot \text{tr}(\text{P})}} $$
(5)
$$ \omega = \frac{{\text{tr}(\text{P})}}{N} $$
(6)

where \( \text{tr(} \cdot \text{)} \) denotes the trace and \( \text{P} = diag(p_{1} , \cdots ,p_{L} ) \) and \( \text{T}({\tilde{\text{X}}}) = (\text{T}(x_{1} ), \cdots ,\text{T}(x_{L} ))^{\text{T}} \). Clearly, we can consider only the relevance of the transformation \( \text{T} \) and simplify the function (3) to obtain a weighted least squares error functional:

$$ {\mathcal{Q}}(\text{T}) = \frac{1}{{2\sigma^{2} }}\sum\limits_{l = 1}^{L} {p_{l} ||y_{l} - \text{T}(x_{l} )||^{2} } $$
(7)

Function (7) is similar to the moving least square method [54], which can be used to solve for the rigid-body transformation \( \text{T} \). However, for correlates of coherent visual motion such as changes in facial expression and multiview image sequences, for which the feature points are extracted in the presence of degradations, the principle of as-rigid-as-possible image registration may not work well. The nonrigid registration problem is essentially a regression problem. It may turn out that the problem of directly solving for the transformation \( \text{T} \) in (7) is ill-posed since no unique solution is available. Regularization techniques, which usually operate in a reproducing kernel Hilbert space (RKHS), are often used to produce reasonable solutions to ill-posed problems.

3.3 Global structure preservation

In order to complete the transformation estimation for the nonrigid registration, we convert the formulation into the nonrigid case by applying regularization theory and the optimal nonrigid transformation \( \text{T} \) can then be estimated by minimizing a weighted regularized least squares error functional:

$$ {\mathcal{E}}(\text{T}) = \frac{1}{{2\sigma^{2} }}\sum\limits_{l = 1}^{L} {p_{l} ||y_{l} - \text{T}(x_{l} )||^{2} } + \frac{\gamma }{2}\phi (\text{T}) $$
(8)

where \( \gamma \ge 0 \) is a fixed regularization parameter that controls the trade-off and \( \phi (\text{T}) \) is the registration term. The transformation \( \text{T} \) is a vector-valued function that lies in a RKHS. Thus, the functional \( \phi \) has the form \( \phi (\text{T}) = ||\text{T}||_{{\mathcal{H}}}^{2} \), where \( {\mathcal{H}} \) is the RKHS and \( ||\; \cdot \;||_{{\mathcal{H}}} \) denotes the norm \( {\mathcal{H}} \).

To complete the least squares estimation of the transformation, we consider a nonrigid transformation function of a specific form, namely a TPS to generate a smooth mapping fitting. A TPS is a general purpose spline tool for representing a flexible nonrigid transformation, which has the advantages of free parameters and a closed-form solution. We model the transformation \( \text{T} \) using the TPS approach, in which the resulting transformation can be decomposed into a global affine transformation and a local nonaffine warping component:

$$ \text{T(}x_{l} \text{) = }x_{l} \cdot \text{H + }k\text{(}x_{l} ,x\text{)} \cdot {\mathcal{C}} $$
(9)

where \( \text{H} \) is defined as a \( 3 \times 3 \) global affine matrix, \( {\mathcal{C}} \) is a \( L \times 3 \) nonaffine coefficient matrix, and the radial basis function \( k(x_{l} ,x) \) is a \( 1 \times L \) vector defined by the TPS kernel for each point, i.e., \( k(x_{l} ,x) = ||x_{l} - x||^{2} \log ||x_{l} - x|| \). The TPS kernel matrix \( {\mathcal{K}}_{L \times L} \) is defined, with each element expressed as \( {\mathcal{K}}_{lp} = k(x_{l} ,x_{p} ) = ||x_{l} - x_{p} ||^{2} \log ||x_{l} - x_{p} || \), where \( x_{p} \) is a point in \( {\tilde{\text{X}}} \).

In motion perception, the concept of motion coherence [33, 34] is applied to intuitively interpret points close to one another as tending to move coherently in space. Motion coherence is a particular way of imposing smoothness on a spatial transformation. We can regularize the local nonaffine component \( k\text{(}x_{l} ,x\text{)} \cdot {\mathcal{C}} \) to enhance the motion coherence and preserve the global structure; then, the regularization term has the form:

$$ \begin{aligned} ||\text{T}||_{{\mathcal{H}}}^{2} = \left\langle {\sum\limits_{l = 1}^{L} {k(x_{l} ,x){\mathcal{C}},\sum\limits_{l = 1}^{L} {k(x_{l} ,x){\mathcal{C}}} } } \right\rangle_{{\mathcal{H}}} \\ \;\;\;\;\;\;\;\; = \sum\limits_{l = 1}^{L} {\left\langle {k(x_{l} ,x_{l} ){\mathcal{C}},{\mathcal{C}}} \right\rangle } = \text{tr}({\mathcal{C}}^{\text{T}} {\mathcal{K}\mathcal{C}}) \\ \end{aligned} $$
(10)

Substituting Eq. (10) into Eq. (8), we can obtain a TPS energy function as follows:

$$ {\mathcal{E}}({\text{H}},{\mathcal{C}}) = \frac{1}{{2\sigma^{2} }}||\text{P}^{{{1/2}}} \left( {{\tilde{\text{Y}}} - {\tilde{\text{XH}}} - {\mathcal{KC}}} \right)||^{2} + \frac{\gamma }{2}\text{tr}({\mathcal{C}}^{\text{T}} {\mathcal{KC}}) $$
(11)

where the second term is the standard TPS regularization term, which is used to maintain the overall spatial relationship of the point sets during registration. It is solely dependent on the nonaffine subspace.

The transformation learning method applied here is a dimension reduction technique on the complete subspace, which aims to produce a smooth function \( \text{T} \). However, it fails to discover the underlying intrinsic geometry of the marginal space \( \text{X} \), which is essential in real applications.

3.4 Intrinsic structure preservation

The main aim of nonrigid registration is to estimate a transformation that maintains the global connectivity and associativity of the entire point set and preserves the intrinsic manifold structure during the transformation. In most nonrigid registration tasks, the success of registration learning for some number of the total set of point pairs (i.e., \( L \le N \)) can be plausibly achieved through the effective use of the labeled point pairs \( \{ x_{l} ,y_{l} \}_{l = 1}^{L} \). However, the remaining \( N - L \) unlabeled point pairs may contain additional structure information, which can impose smoothness conditions on possible solutions. To preserve the intrinsic geometry of the data space, leveraging recent advances in spectral graph theory, we attempt to extend the energy function in Eq. (11) by incorporating ensemble graph Laplacian regularization, thereby ensuring that the transformation is smooth with respect to both the RKHS \( {\mathcal{H}} \) and the marginal distribution.

Motivated by the manifold assumption [55], if two feature points \( x_{i} \) and \( x_{j} \) are close in the intrinsic geometry of the marginal distribution on the entire set of transformation data \( {\mathcal{T}}(\text{X}) = ({\mathcal{T}}(x_{1} ), \cdots ,{\mathcal{T}}(x_{N} ))^{\text{T}} \), then the representations of these two points in the new basis should also be close when the Laplacian regularization term is minimized. The graph Laplacian [56] can be used to discretely approximate the manifold Laplacian \( ||{\mathcal{T}}||_{{\mathcal{M}}}^{2} \), which measures the smoothness of \( {\mathcal{T}} \) along the data manifold on the model point set \( \text{X} \).

Consider a graph that is conventionally defined as \( {\mathcal{G}} = \{ {\mathcal{T}},r,\text{W}\} \), where \( {\mathcal{T}} \) is a set of vertices that can be written in matrix form as \( {\mathcal{T}} = \text{XH + KC} \), where \( \text{K}_{N \times N} \) is the Gram matrix and \( \text{K}_{ij} = k(x_{i} ,x_{j} ) \), \( \text{C}_{N \times 3} \) is the matrix of coefficients; \( r \), which is a constant parameter, is used to compute the weights of the edges of the graph, and \( \text{W} \) is an edge-weight matrix, with \( \text{W}_{ij} \) being the weight of the edge connecting vertices \( {\mathcal{T}}(x_{i} ) \) and \( {\mathcal{T}}(x_{j} ) \):

$$ \begin{aligned} \text{W}_{ij} = \left\{ {\begin{array}{*{20}c} {\text{e}^{{ - \frac{1}{r}||{\mathcal{T}}(x_{i} ) - {\mathcal{T}}(x_{j} )||^{2} }} } & {{if||\mathcal{T}}(x_{i} ) - {\mathcal{T}}(x_{j} )||^{2} \le r} \\ 0 & {\text{otherwise}} \\ \end{array} } \right. \\ \text{s.t.}\;\;\;{\mathcal{T}}(x_{j} ) \in {\mathcal{N}}({\mathcal{T}}(x_{i} )) \\ \end{aligned} $$
(12)

where the neighborhood \( {\mathcal{N}}({\mathcal{T}}(x_{j} )) \) is the set of vertices connected to vertex \( {\mathcal{T}}(x_{i} ) \) by edges. According to \( \text{W} \), the graph Laplacian is a real symmetric matrix, and it has a complete set of orthonormal eigenvectors that can be defined as \( \text{A} = \text{D} - \text{W} \), where \( \text{D} \) is a diagonal matrix with elements \( \text{D}_{ii} = \sum\nolimits_{i = 1}^{N} {\text{W}_{ij} } \). Thus, the data manifold Laplacian can be preserved by minimizing the following:

$$ ||{\mathcal{T}}||_{{\mathcal{M}}}^{2} = \frac{1}{2}\sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{N} {\text{W}_{ij} ||{\mathcal{T}}(x_{i} ) - {\mathcal{T}}(x_{j} )||^{2} } } \;{ = \text{tr}(\mathcal{T}}^{\text{T}} {\text{A}\mathcal{T})} $$
(13)

As seen from Eq. (13), which is defined on the basis of a nonnormalized graph Laplacian that is sensitive to the vertex degree of the graph, it is reasonable to use the normalized graph Laplacian \( {\tilde{\text{A}}} = \text{D}^{{{ - 1/2}}} \text{AD}^{{{ - 1/ 2}}} \) instead of \( \text{A} \) in all formulas.

3.5 Global-intrinsic structure preservation

We assume that \( {\tilde{\text{X}}} \) and \( {\tilde{\text{Y}}} \) are in the putative point set \( \text{S} \) and correspond to the first \( L \) rows of the given point sets \( \text{X} \) and \( \text{Y} \), respectively. Based on the TPS energy function (11) and the given graph Laplacian regularization term, the transformation parameters can be estimated by minimizing the following function:

$$ \begin{aligned} {\mathcal{E}}({\text{H,C}}) & = \frac{1}{{2\sigma^{2} }}||{\text{P}}^{{{1/2}}} {(\tilde{\text{Y}}} - {\tilde{\text{XH}}} - {\text{JKC}})||^{2} \\ & + \frac{{\gamma_{1} }}{2}{\text{tr}}({\text{C}}^{\rm T} {\text{J}}^{\rm T} {\mathcal{K}}{\text{JC}}) + \frac{{\gamma_{2} }}{2}{{\text{tr}}({\mathcal{T}}}^{\text{T}} {\tilde{\text{A}}{\mathcal{T}})} \end{aligned} $$
(14)

where \( \text{J}_{L \times N} = (\text{I}_{L \times L} ,0_{L \times (N - L)} ) \), with \( \text{I} \) being a unit matrix and the rest of the elements being 0. Solving for the parameter pair \( \text{H} \) and \( \text{C} \) by applying an iterative updating method in which we take the derivatives of (14) with respect to \( \text{H} \) and \( \text{C} \) and set them equal to zero, which leads to the following linear system:

$$ \begin{aligned} ({\tilde{\text{X}}}^{\text{T}} {\text{P}\tilde{\text{X}} + }\gamma_{2} \sigma^{2} \text{X}^{\text{T}} {\tilde{\text{A}}\text{X}})\text{H} \\ \;\;\;\;\;\; = ({\tilde{\text{X}}}^{\text{T}} {\text{P}\tilde{\text{Y}}} - {\tilde{\text{X}}}^{\text{T}} \text{PJKC} - \gamma_{2} \sigma^{2} \text{X}^{\text{T}} {\tilde{\text{A}}\text{KC}}) \\ \end{aligned} $$
(15)
$$ \begin{aligned} (\text{K}^{\text{T}} \text{J}^{\text{T}} \text{PJK} + \gamma_{1} \sigma^{2} \text{J}^{\text{T}} {\mathcal{K}\text{J}} + \gamma_{2} \sigma^{2} \text{K}^{\text{T}} {\tilde{\text{A}}\text{K}} + \tilde{\varepsilon }\text{I})\text{C} \\ \;\;\;\;\;\; = (\text{K}^{\text{T}} \text{J}^{\text{T}} {\text{P}\tilde{\text{Y}}} - \text{K}^{\text{T}} \text{J}^{\text{T}} {\text{P}\tilde{\text{X}}\text{H}} - \gamma_{2} \sigma^{2} \text{K}^{\text{T}} {\tilde{\text{A}}\text{XH}}) \\ \end{aligned} $$
(16)

where \( \tilde{\varepsilon }\text{I} \) control the stability of the numerical value. Thus, we obtain a stable solution for \( {\mathcal{T}} \) in Eq. (14).

3.6 Fast implementation

The kernel matrix scales poorly for high-dimensional nonrigid registration, which may pose a serious problem because of high computational complexity in time and space. To address this problem, we provide an approximation method as the basis for a fast implementation, with the aim of reducing the computational complexity, based on Ref. [57]. Give a subset \( \{ \hat{x}_{q} \}_{q = 1}^{Q} (Q \ll L) \) drawn from the putative point set \( {\tilde{\text{X}}} \) at random without replacement. Then, we consider replacing \( {\mathcal{T}} \) in Eq. (14) by searching for a solution that allows only the coefficients \( {\hat{\text{C}}} \) to be nonzero. With slightly different notation, we seek a solution that has the following approximate solution:

$$ {\mathcal{T}}(\hat{x}_{q} ) = \hat{x}_{q} \cdot {\hat{\text{H}} + }k\text{(}\hat{x}_{q} ,\hat{x}\text{)} \cdot {\hat{\text{C}}} $$
(17)

where \( k(\hat{x}_{q} ,\hat{x}) \) is a \( 1 \times Q \) vector, \( {\hat{\text{H}}} \) is a \( 3 \times 3 \) matrix and \( {\hat{\text{C}}} \) is an \( Q \times 3 \) matrix. We obtain the optimal coefficients \( {\hat{\text{H}}} \) and \( {\hat{\text{C}}} \) from the following linear system:

$$ \begin{aligned} ({\tilde{\text{X}}}^{\text{T}} {\text{P}\tilde{\text{X}} + }\gamma_{2} \sigma^{2} \text{X}^{\text{T}} {\tilde{\text{A}}\text{X}}){\hat{\text{H}}} \\ \;\;\;\;\;\; = ({\tilde{\text{X}}}^{\text{T}} {\text{P}\tilde{\text{Y}}} - {\tilde{\text{X}}}^{\text{T}} {\text{PJU}\hat{\text{C}}} - \gamma_{2} \sigma^{2} \text{X}^{\text{T}} {\tilde{\text{A}}\text{U}\hat{\text{C}}}) \\ \end{aligned} $$
(18)
$$ \begin{aligned} (\text{U}^{\text{T}} \text{J}^{\text{T}} \text{PJU} + \gamma_{1} \sigma^{2} \text{G} + \gamma_{2} \sigma^{2} \text{U}^{\text{T}} {\tilde{\text{A}}\text{U}} + \tilde{\varepsilon }\text{I}){\hat{\text{C}}} \\ \;\;\;\;\;\; = (\text{U}^{\text{T}} \text{J}^{\text{T}} {\text{P}\tilde{\text{Y}}} - \text{U}^{\text{T}} \text{J}^{\text{T}} {\text{P}\tilde{\text{X}}\hat{\text{H}}} - \gamma_{2} \sigma^{2} \text{U}^{\text{T}} {\tilde{\text{A}}\text{X}\hat{\text{H}}}) \\ \end{aligned} $$
(19)

where \( \text{U} \) is the TPS kernel matrix \( \text{U} \in {\mathbb{R}}^{N \times Q} \) with elements \( \text{U}_{iq} = k(x_{i} ,\hat{x}_{q} ) \) and \( \text{G} \in {\mathbb{R}}^{Q \times Q} \) with elements \( \text{U}_{ij} = k(\hat{x}_{i} ,\hat{x}_{q} ) \). Compared to the linear system in Eq. (16), the time complexity of computing the coefficients in Eq. (19) is reduced from \( {\mathcal{O}}(N^{3} ) \) to \( {\mathcal{O}}(MN^{2} ) \). The spatial complexity of our fast implementation can be considered to be \( {\mathcal{O}}(N^{2} ) \) due to the memory requirements for storing the graph Laplacian matrix \( {\tilde{A}} \).

3.7 Implementation details

The registration performance of our method typically depends on the homogeneous coordinate system, in which the coordinates of the points in both sets have zero mean and unit variance.

Parameter settings: There are six main parameters in our method, and all six parameters are set on the basis of experimental evaluations. The regularization parameters include \( \gamma_{1} \) and \( \gamma_{2} \), which are used to control the effects of the global and intrinsic topological constraints, respectively, on the transformation \( {\mathcal{T}} \); we fix them to values of \( \gamma_{1} = 5 \) and \( \gamma_{2} = 0.05 \). The parameter \( a \) of the uniform distribution is set to \( a = 10 \). For the inlier ratio \( \omega \) of the Gaussian distribution, we adopt the initial assumption that this parameter has a fixed value of 0.9. The parameter \( \varsigma \), which is used to identify the reliability of the inliers, is fixed to 0.75. The parameter \( r \), which is used to construct the edge-weight matrix, is empirically set to 0.05. We summarize our approach in algorithm 1.

figure e

4 Experiments and analysis

In this section, we evaluate the performance of our proposed method on both contour images and real-world images for comparison with other state-of-the-art methods: GLTP [45, 46], EM-TPS [39], MR-RPM [41, 42], GMMREG [32], and CPD [35]. All tests were run directly in the MATLAB environment on a standard PC with an Intel i5 3.4 GHz CPU.

We consider the registration error to evaluate the performance of the registration methods. The error is defined as the root mean square error, which directly characterizes the registration performance; it is calculated as follows:

$$ \text{Error} = \sqrt {\frac{1}{{N_{{\text{total}}} }}\sum\limits_{k = 1}^{{N_{{\text{total}}} }} {(y_{k} - {\mathcal{T}}(x_{k} ))^{2} } } $$
(20)

where \( N_{{\text{total}}} \) denotes the total number of true correspondences and \( y_{k} \) is the ground truth correspondence with \( x_{k} \).

4.1 Overall performance on contour images

The contour images considered here [30] are based on two category shape models: a fish and a Chinese character. The fish and Chinese character datasets contain 98 and 105 points, respectively. For each shape model, four groups of point sets with different types of degradation, including deformation, noise, occlusion, and outliers, are designed to test the performance of various registration methods. For the fish and Chinese character datasets, there are five or six levels of degradation, and 100 samples are generated for each degradation level.

4.1.1 Nonrigid fish point set registration

The registration results obtained on the fish shape model are illustrated in Fig. 1 for the largest degree of degradation in each category, such as deformation level to 0.08, noise level to 0.05, occlusion ration to 0.5, and outlier ratio to 2. As shown in these results, MR-RPM and our method perform well for nonrigid deformation. In the noise experiment, we see that EM-TPS can align the two point sets. However, it fail to generate the correct correspondence. In the occlusion and outlier tests, our method gets superior registration results.

Fig. 1
figure 1

Point set registration results for fish shape samples with the largest degradations, from the top row to the bottom row: deformation (0.08), noise (0.05), occlusion (0.5), and outliers (2). The leftmost column shows the model point set (blue pluses) and the scene point set (red circles). The goal is to align the model point set with the scene point set. In the second column from the left to the rightmost column, the registration of GLTP, EM-TPS, GMMREG, MR-RPM, CPD, and our method, respectively, is presented

To further explore the registration performance, we consider the error bars of the registration results to evaluate the performance of the registration methods. The error bars represent the means and standard deviations of the registration errors on all 100 samples for each level of the various types of degradation. We summarize the experimental results in Fig. 2. According to Fig. 2a, we observe that our method and MR-RPM perform better due to adopting the intrinsic topological constraints. In Fig. 2b, there is no winner when the model point set is disturbed by the white Gaussian noise. Nevertheless, our method can achieve quite satisfactory results, followed by MR-RPM. Figure 2c and d show the results for the cases of occlusion and outliers. Our method performs bests because of constructing correspondences for contour points under an ordering constraint and integrating two topologically complementary constraints into a unified optimization framework. Figure 2e–h separately show the average time consumptions of the different registration methods at the different degradation levels. We can see GLTP spend more time under deformation, noise, and occlusion test. Meanwhile, in the outlier test, the more time consumption of EM-TPS, MR-RPM and our proposed method when the outlier level becomes relatively high, such as 1.5 or 2.

Fig. 2
figure 2

Comparison of the performance of the registration methods for point sets based on the fish shape model with different levels of degradation. a, e: Deformation test. b, f: Noise test. c, g: Occlusion test. d, h: Outlier test

In summary, the registration performance of all methods gradually declines as the degree of degradation increases. Our method performs more desirably in general because of integrating global and intrinsic topological constraints into a GMM-based framework and modeling the transformation using the TPS approach. GLTP performs wells in the deformation and noise tests. However, it is sensitive to occlusion and outliers because the LLE constraints are less reliable when the point sets are sparse or nonuniform. The EM-TPS method performs poorly in most cases, and its registration error shows strong fluctuations because it only consider the global distribution of the model point set. The MR-RPM method achieves good performance in deformation and noise tests because of the manifold regularization technique. However, it is vulnerable to occlusion and outliers, because the initial correspondences are likely inaccurate. GMMREG is not robust in most cases, especially when the data point set is badly degraded. The alignment performance of CPD is relatively good in deformation and noise tests and spends less time than others in all tests, but it is still sensitive to occlusion and outliers due to the impossibility of capturing the true distributions of the point sets.

4.1.2 Nonrigid the Chinese character point set registration

The performance in the Chinese character experiment is displayed in Fig. 3 for the highest level of each type of degradation. Our method and MR-RPM perform better than the other four methods in the deformation test. In the noise test results, we observe that the model point set is distorted by Gaussian noise, which disturbs the neighborhood structures in the model point set. It is extremely difficult for all methods to achieve accurate alignment. When there are missing points or outliers in the model point set, we observe that our method gives the best registration results.

Fig. 3
figure 3

Point set registration results for Chinese character samples with the largest degradations, from the top row to the bottom row: deformation (0.08), noise (0.05), occlusion (0.5), and outliers (2). The leftmost column shows the model point set (blue pluses) and the scene point set (red circles). The goal is to align the model point set with the scene point set. In the second column from the left to the rightmost column, the registration of GLTP, EM-TPS, GMMREG, MR-RPM, CPD, and our method, respectively, is presented

Quantitative comparative experiment results on the Chinese character dataset under four degradation scenarios are shown in Fig. 4. In Fig. 4a, the average error of all methods gets larger as the level of degradation goes up. The growth rate of the average error is much slower for MR-RPM and our method than for the other methods. In Fig. 4b, all methods except EM-TPS can obtain satisfactory results, our proposed method performs slightly better than the other tested methods with the noise level ranging from 0 to 0.05. Figure 4c and d present the registration results of all methods in the presence of occlusion and outliers, respectively. GLTP, EM-TPS, GMMREG, MR-RPM, and CPD are not robust to occlusion and outliers and are easily confused at a relatively high degradation level. This is the result of their failure to establish reliable correspondences. In contrast, our proposed method obtain the best registration results because of the correspondences are established subject to the ordering information of the contour points using local descriptors. Figure 4e–h show the average time consumptions for different degradation types and degradation levels. Compared to the fish shape, the contour points of the Chinese character model are more spread out around the shape, and the shape structure is more complex. Consequently, more time is consumed on the Chinese character dataset than on the fish dataset. We see that all methods achieve similar time performance on both the fish and Chinese character datasets, although they need less time to accomplish the registration for each degradation level and each type of degradation in the former case.

Fig. 4
figure 4

Comparison of the performance of the registration methods for point sets based on the Chinese character shape model with different levels of degradation. a, e: Deformation test. b, f: Noise test. c, g: Occlusion test. d, h: Outlier test

In summary, the overall trend of error bars on the Chinese character dataset is similar to that on the fish dataset. Our proposed method obtains better assignments in almost all cases under all four degradation scenarios. Both the fish and Chinese character results show that our proposed method is effective for the nonrigid registration of 2D point sets.

4.2 Experimental and analysis on real-world image data

In this subsection, to test and compare the performance of our proposed method on real-world images, we conducted experiments on the Tarragona palmprint [58] and CASIA fingerprint image databases [59]. We collect two groups of image sequences with nonrigid transformation form each database. Here, each sequence contains a series of five homologous images. We randomly choose an image of each sequence is used as the model image, and the others are treated as scene images; thus, we can obtain 4 image pairs from each sequence.

In this experiment, we first constructed the model and scene point sets (150 point pairs) between each pairs of images based on the open source VLFeat toolbox [60], and then the initial correspondences were detected by means of nearest neighbor strategy between the model and scene point sets. Note that due to the limitations of this strategy, not all estimated initial correspondences will be correct. Finally, we aligned the model point set with the scene point set to partition the initial correspondences into inliers and outliers and capture the visual correspondences between the model and scene point sets.

We select several commonly used metrics, namely the accuracy, precision, and recall to evaluate the performance. The accuracy is the proportion of true results among the total number of point pairs. The precision is the proportion of true inliers among the total number of identified inliers. The recall is the proportion of true inliers among the total number of initial correspondences. Specifically, the accuracy, precision, and recall, which are used to evaluate the feature registration results, are defined as:

$$ \text{Accuracy} = \frac{{\text{TP} + \text{TN}}}{{\text{TP} + \text{TN} + \text{FP} + \text{FN}}} $$
(21)
$$ \text{Precision} = {{\text{TP}} /{(\text{TP} + \text{FP})}} $$
(22)
$$ \text{Recall} = {{\text{TP}}/ {(\text{TP} + \text{FN})}} $$
(23)

where the terms \( \text{TP} \), \( \text{FP} \), \( \text{TN} \), and \( \text{FN} \) denote true positives, false positives, true negatives, and false negatives, respectively.

Under the assumption that the initial correspondences are the ground truth, Fig. 5 and Fig. 6 show the qualitative results of our proposed approach based on the palmprint and fingerprint image databases. Clearly, almost all of the inliers (\( \text{TP} \)) and outliers (\( \text{FN} \)) among the initial correspondences are correctly distinguished, and the inliers (TP + FP) between the model and scene point sets can also be established using our approach. According to the experimental results, the performance of our method is better than that of the nearest neighbor strategy. Tables 1, 2, 3, and 4 show the accuracy, precision, and recall for every image sequence, and we see that our method achieves higher accuracy and precision values in most cases, indicating its strong ability to identify inliers. Meanwhile, we note that our method achieves a low recall value on most test image pairs, which means that many outliers are falsely captured among the initial correspondences by the nearest neighbor strategy, and our method can effectively identify outliers in the initial set for each image pair.

Fig. 5
figure 5

Qualitative results on right palmprint image pairs (first two columns) and left palmprint image pairs (last two columns) obtained using our method. From the top row to the bottom row, the registration performance is shown for image pairs ‘1 and 2’ to ‘1 and 5’. The feature point alignment is shown within each set of experiments (the model point set: blue pluses, the scene point set: red circles). The lines indicate the matching results, where the blue (TP) and yellow lines (FN) denote the preserved inliers and removed outliers among the initial correspondences, respectively, and the blue (TP) and red lines (FP) codenote the identified inliers between the model and scene point sets

Fig. 6
figure 6

Qualitative results on right fingerprint image pairs (first two columns) and left fingerprint image pairs (last two columns) obtained using our method. From the top row to the bottom row, the registration performance is shown for image pairs ‘1 and 2’ to ‘1 and 5’. The feature point alignment is shown within each set of experiments (the model point set: blue pluses, the scene point set: red circles). The lines indicate the matching results, where the blue (TP) and yellow lines (FN) denote the preserved inliers and removed outliers among the initial correspondences, respectively, and the blue (TP) and red lines (FP) codenote the identified inliers between the model and scene point sets

Table 1 Comparison of registration results on right palmprint image datasets in terms of accuracy, precision, and recall (Em: evaluation metric)
Table 2 Comparison of registration results on left palmprint image datasets in terms of accuracy, precision, and recall (Em: evaluation metric)
Table 3 Comparison of registration results on right fingerprint image datasets in terms of accuracy, precision, and recall (Em: evaluation metric)
Table 4 Comparison of registration results on left fingerprint image datasets in terms of accuracy, precision, and recall (Em: evaluation metric)

Next, we present a quantitative comparison, including the total numbers of identified inliers (\( \text{TP + FN} \)) and the corresponding average errors between the model and scene point sets. The results are illustrated in Fig. 7. EM-TPS has a better average error than the other methods, followed by our method and MR-RPM; these findings illustrate that these methods achieve more robust results in identifying inliers. However, EM-TPS ignores the importance of preserving the intrinsic geometrical structure information among the point sets and balancing the intrinsic and global constraints during the transformation, which forces disruption of the local structure of the point sets during registration. Thus, the registration results do not generate satisfactory alignments because some outliers may be falsely identified as inliers, as shown in Fig. 8. Our method and MR-RPM identify similar numbers of inliers. However, considering both accuracy and precision, our method shows slightly better performance.

Fig. 7
figure 7

Comparison of results on real image datasets. The total numbers of identified inliers are shown on the histogram end caps, and the average errors are plotted

Fig. 8
figure 8

Qualitative results on left palmprint image pairs ‘1 and 2’ (top row) and left fingerprint image pairs ‘1 and 2’ (bottom row) obtained using EM-TPS

In addition, the corresponding correct match ratios for every dataset are shown in Figs. 9 and 10. The correct match ratio is the proportion of identified inliers among the total number of point pairs (150 point pairs). As depicted in these two figures, the experimental results of EM-TPS are stable or vary only slightly as the distance level increases. Intuitively, it appears to perform the best due to assigning point matches that capture only the global distributions without considering the intrinsic geometrical structure information, which disturbs the local neighborhood structures of the input data and forces some point pairs between the model and scene point sets achieve accurate alignments. However, the feature correspondences in this method are usually established based on only global spatial relationship, and hence some unknown false matched will be introduced which degrade the homing performance; therefore, this method cannot meet the needs of a real engineering environment.

Fig. 9
figure 9

Performance of all tested methods on right palmprint (right column) and left palmprint (left column) datasets in terms of the correct match ratio

Fig. 10
figure 10

Performance of all tested methods on right fingerprint (right column) and left fingerprint (left column) datasets in terms of the correct match ratio

In some cases, the correct match rates of GLTP indicate that the desired results can be obtained in a distance range of 0–4. However, there are fewer captured correspondences as the distance between the corresponding features increases. The GMMREG and CPD methods fail due to the input data are contaminated by outliers. In contrast, MR-RPM and our method achieve better objective performance among the compared approaches. Overall, they can establish more reliable correspondences than the other methods.

5 Conclusion

In this paper, we propose a novel approach for nonrigid point set registration, in which the registration problem is formulated as a probabilistic model that is solved using an iterative EM method. A key characteristic of our approach is that it incorporates a TPS-based constraint and a spectral graph-based constraint into a unified GMM-based framework to preserve the global and intrinsic topological characteristics of the point sets after transformation. In addition, the spatial transformation related to the model point sets is parameterized using the TPS approach, allowing the transformation to be decomposed into affine and nonaffine components. This TPS-based minimization framework has the advantages of free parameters and closed-form solutions in the cases of different degradations. We experimentally tested our approach on contour image and real image datasets and compared it to GLTP, EM-TPS, GMMREG, MR-RPM, and CPD, and the results demonstrate the superiority of our proposed approach in most scenarios.