Introduction

Suppose we are given a pair of pictures of the same scene from different angles, represented by grayscale images called the source and target, respectively. We will denote them as functions \(I_{j} :{\mathcal {D}}_{j} \rightarrow {\mathbb {R}}\) for \(j=1,2\) for the source and target, respectively, where both domains are the unit square \({\mathcal {D}}_{1}={\mathcal {D}}_{2}=[0,1]\times [0,1]\), and the value of \(I_{j}(p)=I_{j}(x,y)\) is the intensity. The goal of image correspondence is to find a suitable transformation \(T: {\mathcal {D}}_{1}\rightarrow {\mathcal {D}}_{2}\) with the property that \(E(T)\) is as small as possible, where E is some cost function measuring the difference between \(I_{1}(x,y)\) and the the composition \(I_{2}(T(x,y))\). In typical applications, the correspondence \(T\) might be considered suitable if it is continuous, differentiable, of bounded variation, or some other niceness criteria. If the source and target come from cameras at the same height (e.g. a person’s left and right eyes), it is natural to require that \(T\) satisfies the“epipolar” condition that corresponding pairs \(p,\varphi (p)\), have the same y-coordinate. A basic example of a cost function is the \(L^2\)-measure

$$\begin{aligned} E(\varphi )=\int _{{\mathcal {D}}_{1}} \left\Vert I_{1}(p)-I_{2}(\varphi (p))\right\Vert ^2 ~\mathrm{d}p, \end{aligned}$$
(1)

which might also contain a term penalizing high variation in \(\varphi\). Another possibility if \(I_{1},I_{2}\) are continuous is to replace the integral in (1) with the supremum over \({\mathcal {D}}_{1}\).

A common problem with the optimization approach is that \(E(\varphi )\) may have local minimizers. Some are simply due to noise in the images, which can be resolved by a smoothing of the objective function, for instance by blurring the images. More serious ones occur when the source image matches with more than one region in the target, or when there is occlusion by foreground objects, so that regions in the source do not have a unique match in the target. One way to deal with this is to only study key points of the two images, as is the case for the highly successful SIFT and SURF methods [1, 6].

In this paper, rather than smooth away local basins, we propose a novel construction which uses persistent homology to classify them, and identify regions in which they are present. We present a new filtered simplicial complex associated to a pair of images, with the property that its persistent homology groups encode robust categories of correspondences, which respect continuous deformation in a particular sense defined in “A Deformation Invariance Result”. In “Practical Examples”, we present practical examples in which confounding properties of image pairs such as repeated patterns can be identified with long bars in the persistent diagram of the corresponding complex.

There are several ways such classifiers could be used to address the original correspondence problem. First, if persistent homology detects the presence of one or more correspondences in a particular domain, one could devise algorithms for producing one or more correspondences that represent a given homology class, which is part of the general problem of producing cycle representatives of homology classes when they exist. Such representatives may then serve as highly informed initial guesses that avoid obstructions, due to confounding features of the image detected by topology. Conversely, if there are no persistent homology classes, that represent a certificate that there are no viable correspondences in a particular domain, as described in Corollary 1 below. The solver would therefore be well served to move on to another region of the source images, rely on other cameras, or move on to a different formulation of the correspondence problem. This leads to the possibility of searching for subregions of the source image of maximal size which satisfy the property that topologically, they are expected to contain a unique correspondence.

Definitions

In this section, we recall some relevant concepts from persistent homology, and formulate the correspondence problem.

Persistent Homology

In this paper, a simplicial complex X will mean an abstract simplicial complex with no particular embedding in space. In other words, X is a collection of subsets \(\Delta\) of some index set S such that if \(\Delta \in X\), and \(\Delta ' \subset \Delta\), then \(\Delta '\in X\). The subsets \(\Delta '\subset \Delta\) are called the faces of \(\Delta\). Its geometric realization will be denoted |X|. For any i, let \(X_i\) denote the i-dimensional simplices. Fix a field \({\mathbb {F}}\), and denote the set of i-chains, i-cycles, i-boundaries, and the boundary operator over \({\mathbb {F}}\) by \(C_i(X),Z_i(X),B_i(X)\), and \(\partial\), respectively. In applications, it is most efficient to let \({\mathbb {F}}\) be a finite field. If \(A \subset X\) is a subcomplex, then we have the long exact sequence in (reduced) relative homology,

$$\begin{aligned} \cdots \rightarrow H_{i}(A)\rightarrow H_{i}(X) \rightarrow H_{i}(X,A)\xrightarrow {\delta } H_{i-1}(A)\rightarrow \cdots , \end{aligned}$$
(2)

where \(H_i(X,A)\) is the relative homology group, and \(\delta\) is the connecting homomorphism.

If X is induced from a triangulation of a closed region with boundary in \({\mathbb {R}}^n\), then its boundary is the image of a subcomplex \(\partial X \subset X\). The orientation on \({\mathbb {R}}^n\) determines a well-defined fundamental class

$$\begin{aligned}{}[X]=\sum _{\Delta \in X_n} \pm e_\Delta \in H_n\left( X,\partial X\right) , \end{aligned}$$

where there is a minus sign if the ordering of the indices on \(\Delta\) is the reverse of the orientation on \({\mathbb {R}}^n\).

A filtration function on X is a function \(f : X\rightarrow {\mathbb {R}} \cup \{\infty \}\) such that whenever \(\Delta '\) is a face of \(\Delta \in X\), we have \(f(\Delta ')\le f(\Delta )\). For any a, the set

$$\begin{aligned} X^{a}=\left\{ \Delta \in X: f\left( \Delta \right) \le a\right\} , \end{aligned}$$

is a subcomplex, with an inclusion map \(\iota ^{a,b} : X^{a} \rightarrow X^{b}\) for \(a\le b\). A complex together with a filtration function is called a filtered complex. We will denote \(C^a_i(X)=C_i(X^{a})\), and similarly for the cycles, boundaries, and homology groups. Then \(\iota ^{a,b}\) induces an inclusion map \(i^{a,b}_* : C_i^a(K)\hookrightarrow C_i^b(X)\) that commutes with the boundary operator, which in turn induces a map \(H_i^a(X)\rightarrow H_i^b(X)\), that need not be injective or surjective. For \(a\le b\), let \(H_i^{a,b}(X)\) denote the persistent homology group, which is the image of \(H_i^a(X) \in H_i^b(X)\).

For any numbers \(a\le b\), we have a nonnegative integer

$$\begin{aligned} \rho _{K}(a,b)=\dim H^{a,b}_i(K)={{\,\mathrm{rk}\,}}\left( H_i^a(K)\rightarrow H_i^b(K)\right) . \end{aligned}$$

The ranks are encoded in the barcode diagram [2, 4, 9], which is the unordered collection of intervals in \({\mathbb {R}}_+ \cup \{\infty \}\), with the property that

$$\begin{aligned} \rho _{K}(a,b)= \#\left\{ [c,d]\in {{\,\mathrm{barcode}\,}}(K):[a,b]\subset [c,d]\right\} . \end{aligned}$$
(3)

It is constructed by assuming that ab take a discrete set values in \({\mathbb {Z}}\cdot \epsilon \subset {\mathbb {Q}}\). Then considering all the homology groups at once as a graded module

$$\begin{aligned} {\mathbb {F}}[x] \rightarrow M=\bigoplus _{a} H^{a}(X),\quad x \mapsto \iota =\sum _{a} \iota ^{a,a+\epsilon }_*. \end{aligned}$$

The barcode is then determined by decomposing M as a module over a principal ideal domain. For an explanation of how these barcodes that can be generated, we refer to the JavaPlex tutorial [8].

Image Correspondences

Denote a source and target image \(I_{1},I_{2}\) respectively by functions from the unit square

$$\begin{aligned} I_{j} : {\mathcal {D}}_{j} \rightarrow C, \quad {\mathcal {D}}_{1}={\mathcal {D}}_{2}=[0,1]\times [0,1], \end{aligned}$$

into some set of possible colors C, for instance \([0,1]^3\) for color images or [0, 1] for grayscale. We suppose that both images are pictures of the same scene, taken from different predetermined locations and angles. For each \(p\in {\mathcal {D}}_{1}\) belonging to the domain of the source image, we choose a parametrization of the epipolar line of the form

$$\begin{aligned} q_p : [0,1] \rightarrow {\mathbb {R}}^2,\quad q_{(x,y)}(t)=\left( x+ct+d,y\right) , \end{aligned}$$

for some numbers cd determined by the relative placement of two cameras, and lower and upper bounds on the x coordinate of corresponding points. In other words, the line parametrizes the points in the target which could be in correspondence with points in the source, assuming the cameras are parallel to the ground, of the same height, and pointed in the same direction. Our examples are all of this form, which is called being rectified, but in general, the parametrization could be more complicated.

Now select a triangulation of \({\mathcal {D}}_{1}\) represented by an inclusion of a pure 2-dimensional simplicial complex \(p : |K|\rightarrow {\mathcal {D}}_{1}\). Suppose we are also given a continuous distance function \(d({\mathcal {T}},{\mathcal {T}}')\) for every pair of triangles \({\mathcal {T}},{\mathcal {T}}'\in {\mathcal {D}}_{1}\), measuring the distance between the restriction of \(I_{1}\) to \({\mathcal {T}}\), and the restriction of \(I_{2}\) to \({\mathcal {T}}'\). The basic function we will use in the examples section is to affinely map \({\mathcal {T}}\) and \({\mathcal {T}}'\) to the unit right triangle whose vertices are (0, 0), (1, 0), and (0, 1), and evaluate the \(L^2\) metric between them. That is,

$$\begin{aligned}&d\left( \left[ p_0,p_0+u_1,p_0+u_2],[q_0,q_0+v_1,q_0+v_2\right] \right) =\nonumber \\&\int _{s=0}^1 \int _{t=0}^{1-s} d_C\left( I_{1}(p_0+su_1+tu_2),I_{2}(q_0+sv_1+tv_2)\right) \mathrm{d}s \mathrm{d}t, \end{aligned}$$
(4)

where \(d_C\) is just the \(L^2\)-metric on the color space \(C=[0,1]^3\). Another extension we will use involves a penalty term when \({\mathcal {T}}'\) is highly warped, meaning it is very far from being equilateral, or has the opposite orientation as \({\mathcal {T}}\).

Define a simplicial correspondence to be an element of the set

$$\begin{aligned} {{\,\mathrm{\Gamma }\,}}(K)=\left\{ \varphi : K_0 \rightarrow I\right\} , \end{aligned}$$

of functions on the vertex set, representing a piecewise linear function on \(|K|={\mathcal {D}}_{1}\). The cost function \(E : {{\,\mathrm{\Gamma }\,}}(K)\rightarrow {\mathbb {R}}_+\) is given by the worst-case triangle

$$\begin{aligned} E(\varphi )=\max _{\left[ i_0,i_1,i_2\right] \in K} d\left( \left[ p_0,p_1,p_2\right] ,\left[ q_{p_0}\left( \varphi (i_0)\right) ,q_{p_1}\left( \varphi (i_1)\right) , q_{p_2}\left( \varphi (i_2)\right) \right] \right) . \end{aligned}$$
(5)

for \(p_a=p(i_a)\).

Main Construction

In this section, we present the main object of the paper, define the complex described in the introduction in the case of two-dimensional images, and present the general pipeline to be followed in the examples of “Practical Examples”.

Illustration in the Continuous Case

We describe the idea in the case of one-dimensional images. Suppose \(I_{1},I_{2}\) are functions on the interval \({\mathcal {D}}_{1}={\mathcal {D}}_{2}=[0,1]\). Consider a correspondence to be a continuous increasing function \(T:{\mathcal {D}}_{1}\rightarrow {\mathcal {D}}_{2}\), with the property that

$$\begin{aligned} d\left( x,T(x)\right) =\left| \left| I_{1}(x)-I_{2}\left( T(x)\right) \right| \right| ^2\le a, \end{aligned}$$

for some upper bound \(a>0\) on the dissimilarity between \(I_{1}\) and \(I_{2}\). For the functions in Fig. 1, and \(a>0\), there will be infinitely many correspondences. However, one can see that there are essentially three groups of them up to continuous deformation, corresponding to the upward and rightward moving paths from the lower boundary of Fig. 1c to the upper one, avoiding the shaded regions which have high dissimilarity (the fact that the three paths only move up and to the right is equivalent to the requirement that T has to be increasing).

Fig. 1
figure 1

A source and target function, shown in (a, b), where the source is on the vertical axis and the target is on the horizontal axis. A correspondence is an increasing continuous function \(T\) satisfying \(d(x,T(x))\le a\). For a relatively small, there are essentially three for these two signals, corresponding to three up and right moving paths from the bottom to the top of (c) avoiding the shaded regions, for which \(d(x,y)>a\)

Let \({\mathcal {X}}={\mathcal {D}}_{1}\times {\mathcal {D}}_{2}\) be the set of (xy) pairs in Figure 1c. We have the persistent homology group for the sublevel set filtration

$$\begin{aligned} H_i^{a,b}({\mathcal {X}})={{\,\mathrm{im}\,}}\left( H_i\left( {\mathcal {X}}^a\right) \rightarrow H_i\left( {\mathcal {X}}^b\right) \right) , \end{aligned}$$
(6)

for any \(a<b\), where \({\mathcal {X}}^a=d^{-1}(-\infty ,a]\subset {\mathcal {X}}\). The relevant barcodes for these groups are shown in Fig. 2. Let \(\pi :{\mathcal {X}}\rightarrow {\mathcal {D}}_{1}\) be the projection \(\pi (x,y)=x\), and let

$$\begin{aligned} {\mathcal {A}}=\pi ^{-1}\left( \partial {\mathcal {D}}_{1}\right) =\pi ^{-1}(\{0,1\})= \{0,1\}\times [0,1], \end{aligned}$$

be the inverse image of the boundary \(\partial {\mathcal {D}}_{1}\). We now similarly have persistent homology for the restriction of d to \({\mathcal {A}}\), and also the relative homology groups \(H^{a,b}_i({\mathcal {X}},{\mathcal {A}})\).

Fig. 2
figure 2

The barcodes for the zeroth and first persistent homology groups of the function d \(:[0,1]\times [0,1]\rightarrow {\mathbb {R}}\) from Figure 1c

A correspondence then induces a continuous section \({\tilde{T}} :{\mathcal {D}}_{1}\rightarrow {\mathcal {X}}\) satisfying

$$\begin{aligned} \pi \left( {\tilde{T}}(x)\right) =x,\quad d\left( x,{\tilde{T}}(x)\right) \le a, \end{aligned}$$

for all x. It induces a persistent homology class

$$\begin{aligned}{}[T]_{a,b}= {\tilde{T}}_*\left( [{\mathcal {D}}_{1}]\right) \in {{\,\mathrm{im}\,}}\left( H_1\left( {\mathcal {X}}^a,{\mathcal {A}}^a\right) \rightarrow H^{a,b}_1\left( {\mathcal {X}},{\mathcal {A}}\right) \right) , \end{aligned}$$
(7)

for any \(b>a\), where

$$\begin{aligned}{}[{\mathcal {D}}_{1}]\in H_1\left( {\mathcal {D}}_{1},\partial {\mathcal {D}}_{1}\right) \cong H_1\left( S^1\right) , \end{aligned}$$

is a generator of the first homology of \(\partial {\mathcal {D}}_{1}\), which is equivalent to the circle. If \(T\sim T'\) are homotopy equivalent in \(({\mathcal {X}}^b,{\mathcal {A}}^b)\), meaning \(T\) can be deformed into \(T'\) without ever leaving \({\mathcal {X}}^b\), keeping the endpoints in \({\mathcal {A}}^b\), they will induce the same class \([T]_{a,b}=[T']_{a,b}\). We also have the image

$$\begin{aligned}{}[\partial T]_{a,b}=\delta \left( [T]_{a,b}\right) \in H^{a,b}_0({\mathcal {A}}). \end{aligned}$$
(8)

under the connecting homomorphism from the long exact sequence (2). The elements \([T]_{a,b}\) and \([\partial T]_{a,b}\) are the one-dimensional version of the classifiers referred to in the introduction.

In the case of two-dimensional image correspondence, we have few differences. First, the domain of an image is now \({\mathcal {D}}_{1}={\mathcal {D}}_{2}= [0,1]\times [0,1]\) as in “Image Correspondences”. We would then expect the space \({\mathcal {X}}\) from “Illustration in the Continuous Case” to have dimension 4. But by the epipolar condition that corresponding points have the same y-coordinate, we instead take

$$\begin{aligned} {\mathcal {X}}=\left\{ (p,q) \in {\mathcal {D}}_{1}\times {\mathcal {D}}_{2}: p_2=q_2\right\} , \end{aligned}$$

which is three-dimensional. The preimage of the boundary \(\partial {\mathcal {D}}_{1}\) is now a cylinder, \({\mathcal {A}}=\pi ^{-1}(\partial {\mathcal {D}}_{1})=S^1 \times [0,1]\), where \(\pi (p,q)=p\), and \(S^1\) is identified with the boundary of the square. The dimensions of the persistent homology groups are moved up by one dimension,

$$\begin{aligned}{}[T]_{a,b} \in H_2({\mathcal {X}},{\mathcal {A}}),\quad [\partial T]_{a,b}\in H_1({\mathcal {A}}). \end{aligned}$$

We have found that the second class \([\partial T]_{a,b}\) is favorable in practice, for one thing because homology in one-dimension is a smaller computation. In examples of correspondence, we will be interested in the persistence diagram of the filtered vector space containing this class, which is

$$\begin{aligned} {{\,\mathrm{im}\,}}\left( H_2 ({\mathcal {X}},{\mathcal {A}}) \rightarrow H_1({\mathcal {A}})\right) = \ker \left( H_1({\mathcal {A}})\rightarrow H_1({\mathcal {X}})\right) . \end{aligned}$$
(9)

Its elements may be thought of as classes of correspondences on the boundary \(\partial {\mathcal {D}}_{1}\), which may be extended to the interior, and therefore map to zero in \(H_1({\mathcal {X}})\). The setup of each of our examples in “Practical Examples” is to present a correspondence problem, exhibit the persistence diagram associated to the filtered vector space on the right side of (9), and show that features of the diagram, i.e. long bars, correspond to pertinent features of the image pair.

Definition of the Filtered Complex

The preceding section shows how persistent homology classes can in  principle be used to classify image correspondences up to continuous deformation, if the relative persistent homology groups \(H^{a,b}_i({\mathcal {X}},{\mathcal {A}})\) can be computed in practice. In this section, we construct a simplicial complex X representing \({\mathcal {X}}\), together with a filtration function \(f:X\rightarrow {\mathbb {R}}_{\ge 0}\) in place of d. We also define a simplicial complex K associated to a triangulation of the base \({\mathcal {D}}_{1}\), and a complex \(A\subset X\) representing \({\mathcal {A}} \subset {\mathcal {X}}\), which is also filtered by f.

Choosing a simplicial complex to represent a space is an interesting problem in general. The motivation for the construction of this paper is that it is fibered as a complex of the base \(X\rightarrow K\) representing \(\pi :{\mathcal {X}} \rightarrow {\mathcal {D}}_{1}\), so that a correspondence \(T\) determines a section \(\varphi :K\rightarrow X\). It is therefore possible to define

$$\begin{aligned}{}[\varphi ]_{a,b}\in H_2^{a,b}(X,A),\quad [\partial \varphi ]_{a,b} \in H_1^{a,b}(A) \end{aligned}$$

analogous to (7) and (8). A byproduct of doing this is that X does not come from a triangulation of \({\mathcal {X}}\), nor is it embeddable in \({\mathbb {R}}^3\).

To do this, begin with a triangulation \(p:|K|\rightarrow {\mathcal {D}}_{1}\), and let \(\partial K \subset K\) denote the boundary, as in “Image Correspondences”. The underlying complex of X is given as follows:

  1. 1.

    Choose some natural numbers \(N_i\) for every vertex \(i\in K_0\), and define the set of 0-simplices as

    $$\begin{aligned} X_0=\left\{ (i,j):i\in K_0,\ 1\le j \le N_i\right\} . \end{aligned}$$
  2. 2.

    Include every 2-simplex \(\Delta =[(i_0,j_0),(i_1,j_1),(i_2,j_2)]\) for which \([i_0,i_1,i_2]\) is a 2-simplex in \(K_2\), and call these the horizontal faces of X.

  3. 3.

    Add every 3-simplex \([(i_0,j_0),(i_1,j_1),(i_2,j_2),(i_3,j_3)]\) satisfying

    1. (a)

      There are only three distinct elements in \(\{i_0,i_1,i_2,i_3\}\), and they are the vertices of a 2-simplex in K.

    2. (b)

      If \(i_a=i_b\) for \(a\ne b\), then \(j_a=j_b\pm 1\).

    In other words, we have added a 3-simplex whenever it includes two horizontal faces differing only in one coordinate by a j-value of one.

  4. 4.

    Include all the faces of every simplex added thus far, making X a legitimate simplicial complex. The 2-simplices that have been added as a result will only contain two distinct i-values, and will be called vertical faces.

The effect of adding the 3-simplices in item 2 is to “fill in” the space between correspondences. Notice that there is an obvious surjective map \(\pi : X\rightarrow K\) of complexes

$$\begin{aligned} \pi \left( \left[ (i_0,j_0),...,(i_k,j_k)\right] \right) =\left[ i_0,...,i_k\right] , \end{aligned}$$

that forgets the j-values. Let \(A =\pi ^{-1}(\partial K)\) be the subcomplex of X whose i-values lie in the boundary \(\partial K \subset K\).

We next define a filtration on this complex. For every horizontal 2-simplex, define

$$\begin{aligned}&f\left( \left[ \left( i_0,j_0\right) ,\left( i_1,j_1\right) ,\left( i_2,j_2\right) \right] \right) =\nonumber \\&\min _{j_a-1 \le s_a \le j_a} d\left( \left[ p_0,p_1,p_2\right] ,\left[ q_{p_0}(s_0),q_{p_1}(s_1),q_{p_2}(s_2)\right] \right) , \end{aligned}$$
(10)

where d is choice of distance measure from “Image Correspondences”. On every 3-simplex, we define the value of f to be the minimum of the two horizontal 2-simplices that are its faces. For every remaining simplex, the weight is inductively defined as the maximum of all simplices for which it is a face. We also obtain a filtration on A by restriction.

As described in “Illustration in the Continuous Case”, the persistent homology groups we are interested in are the persistence diagrams for the kernel of the map induced by the inclusion

$$\begin{aligned} \ker \left( H_1(A)\rightarrow H_1(X)\right) . \end{aligned}$$
(11)

To generate a persistence diagram in JavaPlex , we will make use of a workaround introduced in [3], which was used to study the image (11) in the case where A is the ideal Klein bottle, and X is a filtered complex representing a dataset of natural images. In this setup, we select a parameter \(t_0>0\), and define \(X_{t_0}\) to have the same underlying complex as X, but where the persistence value of all interior simplices \(\Delta \in X-A\) (which includes all horizontal simplices) begin at \(t_0\). This encodes the map \(A\rightarrow X\) into single complex, by having the persistence values in the interior \(X-A\) begin at \(t_0\), by simply shifting the persistence values. We have found this approach to be sufficient for our purposes, though in future applications we expect to study the kernel in (9) directly, for instance using persistence for kernels and images in Dionysus [7].

We now describe how to interpret the persistence diagram of \(X_{t_0}\) in terms of image correspondences. For each image pair, the persistence diagram will show the following types of bar:

  1. 1.

    Multiple short bars: these may be disregarded as noise.

  2. 2.

    Multiple long bars, which begin to the right of the chosen offset parameter \(t_0\): these represent partial solutions in some subregion in the interior, that do not extend to the boundary values.

  3. 3.

    Long bars, with left endpoint slightly greater than zero, and right endpoint slightly greater than \(t_0\): they represent elements of the kernel (11), which come from correspondences of the entire source image. In other words, they represent true solutions to the correspondence problem.

  4. 4.

    Even longer bars whose right endpoint is significantly greater than \(t_0\): They represent correspondences near the periphery of the source image, but do not extend to the entire diagram, and so are not in the kernel (11). In other words, they are partial solutions which solve the correspondence problem near the boundary of the image, but which do not match on some regions in the interior.

Pipeline

Fig. 3
figure 3

The construction of our proposed complex. The source and target images are slightly skewed from one another, to suggest different camera angles of the same scene. The middle of the five triangles in the target image of (d) will have the lowest value of \(d({\mathcal {T}},{\mathcal {T}}')\)

Here we give an explicit description of the complex \(X_{t_0}\) that determines classes of correspondence between two images. We assume that we are given two images \(I_{1}\) and \(I_{2}\), as indicated in Fig. 3a. The construction is as follows:

  1. 1.

    Let L be a collection of evenly-distributed landmark points in \({\mathcal {D}}_{1}\), such as a hexagonal lattice (Fig. 3b).

  2. 2.

    Associated with each landmark point \(p\in L\), we have a collection of possible images of that landmark point Q(p) in the domain of \(I_{2}\), which are the possible q-values from the last section. The set Q could be infinite or finite, and is determined by some prior knowledge about the camera placement or other initial pre-processing. In our example, we have two cameras that are horizontally aligned, so we restrict Q(p) to a horizontal interval, which we discretize to obtain a finite complex, as shown in Fig. 3c. The set Q partially characterizes the “niceness” of the correspondence map that we seek, by restricting the plausible locations that we think that p could land under a “nice” mapping \(T\). The 0-simplices of the complex X are the union \(\bigcup _{p\in L}Q(p)\).

  3. 3.

    Build a Delaunay triangulation of L. For each triangle \({\mathcal {T}}=(x,y,z)\) in the triangulation, and for every triangle \({\mathcal {T}}'\) of the form (uvw), with \(u\in Q(x)\), \(v\in Q(y)\), and \(w\in Q(z)\), we do the following:

    • If the shape of \({\mathcal {T}}'\) is “similar” to that of \({\mathcal {T}}\), then add a 2-simplex to X (by “similar”, we mean for example that the lengths of the perimeters of \({\mathcal {T}}\) and \({\mathcal {T}}'\) are not too different, and that their orientations are the same). Its persistence value is \(d({\mathcal {T}},{\mathcal {T}}')\), where \(d(\cdot ,\cdot )\) is the distance measure in Eq. (4), or any similar counterpart, such as earth mover’s distance. We have done this instead of the minimum taken in Eq. (10) only in the interest of speed, and because in our examples the values do not vary much in that domain, making this an acceptable approximation. In more sensitive applications we would expect to produce each value by solving an actual optimization problem in a highly localized domain. This obviously adds component 1- and 0-simplices as well, whose persistence values are not defined yet.

    This process is shown in Fig. 3d.

  4. 4.

    Set the persistence values of the lower-dimensional simplices in the natural way: each 1-simplex has a persistence value equal to the minimum persistence value among all 2-simplices containing it, and each 0-simplex has a persistence value equal to the minimum persistence value among all 1-simplices containing it.

  5. 5.

    Choose a value \(t_0>0\) which is somewhere between the expected left and right endpoints of the important bars in the persistence diagram. For each simplex \(\Delta\) (of any dimension), if \(\Delta\) contains an interior vertex, add \(t_0\) to its persistence value. Call the complex with the new persistence values \(X_{t_0}\).

Practical Examples

In this section present three example applications, following the description of the pipeline from “Pipeline”. Each example contains an image pair and a description of the correspondence problem, technical information such as the choice of the dissimilarity function and the value of the \(t_0\) parameter, and the persistence diagram for the complex \(X_{t_0}\) described at the end of “Definition of the Filtered Complex”.

Example: Identical Black Discs

Our first example is primarily a conceptual warmup example which illustrates some of the interesting features in Example 4.3. We choose our source and target to be the two identical black opaque circles shown in Fig. 4.

Fig. 4
figure 4

Two identical source and target images

The source image is triangulated by a complex K with 100 equilateral triangles. Our distance function is the \(L^2\)-distance from (4). We remove all 2-simplices of X which are either orientation reversed, or are sheared to a width of more than double that of the based triangle, by setting the persistence score to infinity. The \(t_0\) parameter is chosen to be \(4.6 \times 10^4\). In place of the desired \(\min\) in Eq. (5), we take a rough approximation of the min in (10) by only sampling a single value for speed purposes as described in “Pipeline”.

Fig. 5
figure 5

The JavaPlex barcode diagram for the images of black discs in Fig. 4

The persistence diagram is shown in Fig. 5. The important information is that there are three long bars, shown bolder in the picture, beginning before \(t_0\). Two of these continue well beyond \(t_0\), while the lower one stops almost immediately after it. The bar that stops near \(t_0\) represents correspondences which correctly correspond points in the entire image. The two longer bars correspond to two types of correspondence of the boundary which cannot be extended to the entire picture, represented by those which carry the boundary of the source entirely in the white space to the left of the black disc in the target, and those that are entirely on the right.

Example: Dot Mesh

The second correspondence problem is to find a mapping \(\varphi\) that preserves vertical coordinates between the point clouds shown in Fig. 6a, b. Although not remotely apparent to the naked eye, there are actually two correspondences between the two, as illustrated in Fig. 6c–e. The two triangulations show that there are essentially two different ways to map the source points into the second.

Our distance measure is defined by

$$\begin{aligned} E(\varphi )=\max _{p\in A} \min _{q\in B} \Vert p-q. \Vert . \end{aligned}$$
(12)

For speed, we have simply replaced the maximum by the values at a sample we did for the circle example. The 2-simplices in X whose width is more than double that of the associated triangles in the domain of the source image \({\mathcal {D}}_{1}\) are effectively removed by setting their persistence score to infinity, and also those triangles in which the orientation has been reversed. The \(t_0\) parameter is set to 180. Figure 7 shows the complexes \(X^{a}\) for a few choices of a. Notice that the interior triangles begin appearing in Fig. 7c, once the \(t_0\) parameter is passed.

Fig. 6
figure 6

Two different correspondences of a mesh filled with a point cloud into the same target

Fig. 7
figure 7

Some diagrams showing the complex \(X^{<a}\) for a few values of a for the dot mesh shown in Fig. 6

Fig. 8
figure 8

The JavaPlex barcode diagram for the images in Fig. 6

The persistence diagram shown in Fig. 8 shows two long bars, which detect the two possible types of correspondence shows in Figs. 6d, e.

Example: Pennies on a Tablecloth

Next consider two real images of a penny sitting on a tablecloth with an interesting pattern shown in Fig. 9. We began by quantizing color values using a filter, and then used a Wasserstein distance function. We eliminated highly skewed triangles using the same criteria from Examples 4.1 and 4.2. The parameter is \(t_0=2.6 \times 10^4\).

Fig. 9
figure 9

Two pictures of a penny on a tablecloth

Fig. 10
figure 10

The JavaPlex barcode diagram for the images in Fig. 9

The persistence diagram is shown in Figure 10. We have removed all barcodes of length \(<30\) for visual purposes. The first Betti number output appears similar to the first example, which is to be expected. However, beyond the extra noise due to the fact that have used a picture from real life, this image has additional features due to the repeating pattern of the tablecloth:

  1. 1.

    We now have interesting Betti zero features (connected components of X) corresponding to the different translations of the tablecloth pattern. They represent the fact that the boundary of the source image can be matched to three different positions on the target.

  2. 2.

    The Betti one features are the same as in Example 4.1, but for a different reason. This time every correspondence of the boundary encircles the penny, but there are three different classes of them due to the pattern. If the left and right borders of the frame in the source are moved in even slightly, so that the border is in the black stripe region at the same height as the source penny, the two additional Betti one barcodes will disappear. This is because the penny in the target will raise the persistence value even for boundary correspondences.

A Deformation Invariance Result

We now define the classes associated to a correspondence \(\varphi\) described in “Illustration in the Continuous Case”, and prove that they are invariant under certain smooth deformations in Theorem 1. The existence of a theorem such as this one is the reason the discussion in each example in "Practical Examples" is valid, as they are implicitly describing classes of correspondences which are equivalent up to homotopy.

Let us call \(\varphi\) an a-correspondence if \(E(\varphi )<a\), denote the set of these elements by \({{\,\mathrm{\Gamma }\,}}(K,a)\). Let us also define \(X^{<a}\) in the same way as \(X^a\) but with strict inequality, and the same for \(A^{<a}\). Assume the dissimilarity function d is continuous as a function of the six vertices of a pair of triangles, so that we may regard \({{\,\mathrm{\Gamma }\,}}(K,a)\) as an open subset \(U\subset {\mathbb {R}}^N\), where N is the number of landmark points, i.e. zero simplices in \(K_0\).

We will say that \(\varphi ,\varphi '\) are b-equivalent, and write \(\varphi \sim _b \varphi '\) if there exists a continuous function \(h:[0,1] \rightarrow {{\,\mathrm{\Gamma }\,}}(K,b)\) satisfying \(h(0)=\varphi\), \(h(1)=\varphi '\). for all \(s\in I\). In other words, they are in the same connected component of \(\Gamma (K,b)\subset {\mathbb {R}}^N\), since path connected and connected are the same for open subsets. We will be interested in studying the set \({{\,\mathrm{\Gamma }\,}}(K,a)/{\sim _b}\) of a-correspondences up to b-equivalence for \(a\le b\), the set being empty otherwise. For instance, \({{\,\mathrm{\Gamma }\,}}(K,a)/{\sim _a}\) is just the discrete set of connected components of the open set U, whereas choosing \(b>a\) results in collapsing more of these components. If \(\varphi \in {{\,\mathrm{\Gamma }\,}}(K,a)\), the equivalence class of \(\varphi\) will be written \({\tilde{\varphi }}_{a,b}\in {{\,\mathrm{\Gamma }\,}}(K,a)/{\sim _b}\).

For any correspondence \(\varphi \in {{\,\mathrm{\Gamma }\,}}(K,a)\), we have an injective map of complexes \(s^{\varphi }: K \rightarrow X\) by

$$\begin{aligned} s^\varphi \left( \left[ i_0,...,i_k\right] \right) =\left[ \left( i_0,j_0\right) ,...,\left( i_k,j_k\right) \right] ,\quad j_a=\lfloor N_{i_a} \varphi \left( i_a\right) \rfloor . \end{aligned}$$
(13)

It is a section of \(\pi\), meaning \(\pi s^{\varphi }={{\,\mathrm{Id}\,}}\), which in particular implies that \(s^{\varphi }(\partial K)\subset A\). By (10), we can see that if \(\varphi \in {{\,\mathrm{\Gamma }\,}}(K,a)\), then \(s^\varphi (K)\subset X^{<a}\).

Definition 1

For any a-correspondence \(\varphi \in {{\,\mathrm{\Gamma }\,}}(K,a)\), let

$$\begin{aligned}{}[\varphi ]_a=s^{\varphi }_*\left( [K]\right) \in H_2\left( X^{<a},A^{<a}\right) ,\quad [\partial \varphi ]_{a}=\delta \left( [\varphi ]_{a}\right) \in H_1(A^{<a}), \end{aligned}$$

where \(\delta\) is the connecting homomorphism from (2). Let \([\varphi ]_{a,b},[\partial \varphi ]_{a,b}\) denote their images in \(H^{a,b}_2(X,A), H^{a,b}_1(A)\), where in this section we use the persistent homology groups using strict inequality, i.e. replacing \(X^{a},A^a\) with \(X^{<a},A^{<a}\).

Theorem 1

If \(\varphi ,\varphi '\in {{\,\mathrm{\Gamma }\,}}(K,a)\), and \(\varphi \sim _{b} \varphi '\), then \([\varphi ]_{a,b}=[\varphi ']_{a,b}\), and \([\partial \varphi ]_{a,b}=[\partial \varphi ']_{a,b}\). In particular, we have well-defined maps

$$\begin{aligned} {{\,\mathrm{\Gamma }\,}}(K,a)/{\sim _b}\rightarrow H^{a,b}_2(X,A) \rightarrow H^{a,b}_1(A), \end{aligned}$$

sending \({\tilde{\varphi }}_{a,b}\) to \([\varphi ]_{a,b}\), and \([\partial \varphi ]_{a,b}\), respectively. Furthermore, for any \(\varphi\), both of these classes are nonzero.

Corollary 1

If either \(H^{a,b}_2(X,A)\), or \(H^{a,b}_1(A)\) are trivial, then \({{\,\mathrm{\Gamma }\,}}(K,a)\) is empty.

We need a simple lemma:

Lemma 1

If pq are in the same connected component of an open subset \(U\subset {\mathbb {R}}^N\), there is a continuous function \(h:[0,1]\rightarrow U\) satisfying

  1. (a)

    It connects the points \(h(0)=p\), \(h(1)=q\).

  2. (b)

    For every \(1\le i\le N\), the set of points \(t\in [0,1]\) with coordinate function \(h_i(t) \in {\mathbb {Z}}\) is finite.

  3. (c)

    No point in the path has two integer-valued coordinates simultaneously, i.e. \(h_i(t)\) and \(h_j(t)\) are not both integers for \(i\ne j\).

Proof

Since connected implies path connected for open sets, we may suppose there is a continuous function satisfying part (a), and by Whitney approximation, we may assume that it is smooth. Part (b) can easily be obtained using standard transversality results, noting that the set of points where the ith coordinate is integral is a co-dimension one sub-manifold of U, and a sub-manifold of [0, 1] of dimension zero is finite. See for instance [5]. Part (c) is then easy to obtain by a simple perturbation argument. \(\square\)

We now move onto the proof of Theorem 1.

Proof

Let \(U={{\,\mathrm{\Gamma }\,}}(K,a)\) be as above, so that \(\varphi ,\varphi '\) define points pq in the same connected component of U. Suppose we have a function h as in Lemma 1. Then applying the definition of \(j_k\) in (13) to h(s) gives functions \(j_k : [0,1]\rightarrow \{1,...,N_k\}\) such that \(j_k(s)\) has jumps from one natural number to an adjacent one at certain values of s. By the lemma, we may suppose that each \(j_k(s)\) has jumps at finitely many values of s, and that \(j_k(s)\) and \(j_l(s)\) never have simultaneous jumps for \(k\ne l\).

We now have a sequence \(\varphi =\varphi _0,...,\varphi _{n}=\varphi ' \in {{\,\mathrm{\Gamma }\,}}(K,b)\) by taking \(\varphi _i=h(s_i)\) where \(s_0=0\), \(s_n=1\), and \(s_i\) is any value between the between the ith and \((i+1)\)st jump for \(1\le i \le n-1\). It now suffices to show that \([\varphi _i]_{a,b}-[\varphi _{i+1}]_{a,b}=0\) for all \(i \in \{0,...,n-1\}\). Given any such i, let k be the coordinate that jumps between i and \(i+1\). Then we have

$$\begin{aligned} s^{\varphi _i}_*([K])-s^{\varphi _{i+1}}_*([K])= \pm \partial \left[ {{\,\mathrm{St}\,}}\left[ \left( k,j_k\left( s_i\right) \right) ,\left( k,j_k\left( s_{i+1}\right) \right) \right] \right] \end{aligned}$$

where \([{{\,\mathrm{St}\,}}\Delta ]\) is the class of the star of \(\Delta\), in other words, the sum of all 3-simplices which have \(\Delta\) as a face, which in the equation is a 1-simplex. By construction, each of these 3-simplices are contained in X, since the second coordinates differ by 1. They are also in \(X^{<b}\), as we may take \(\varphi _i\) as an upper bound for the the min in (10).

Finally, we check that neither class vanishes. Since \(\pi i^{a,b} s^{\varphi }:K\rightarrow K\) is the identity map, we have

$$\begin{aligned} \pi _*([\partial \varphi ]_{a,b})= \pi _*(\delta ([\varphi ]_{a,b}))=\delta ([K])=[\partial K]\in H_1(\partial K), \end{aligned}$$

which is nonzero, so that \([\partial \varphi ]_{a,b}\ne 0\). Since \([\partial \varphi ]_{a,b}=\delta ([\varphi ]_{a,b})\), we must have \([\varphi ]_{a,b}\ne 0\) as well. \(\square\)