Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Shape is a powerful feature for recognition as psychophysical studies [1] have shown. As an important feature carrier, shape contour is widely used in object detection and matching methods [2, 3]. This is because shape contour is invariant to lighting conditions and variations in object colour and texture. More importantly, it can efficiently represent image structures with large spatial extents [3].

Since shape contour is sensitive to object deformation, developing robust contour-based shape descriptors invariant to shape rotation, translation, scaling, and transformation is still a challenge task. Although skeleton-based descriptors contain both shape features and topological structures of original objects, they require heavy calculation for skeleton pruning [4]. Moreover, the following matching process is highly dependent on the quality of branch pruning. Recently, researchers have tried to employ contour segment [2, 5, 6] for shape matching since contour segments are invariant to shape deformation. However, in order to generate applicable contour segments, appropriate partition points are required. Traditionally, skeleton endpoints can be used as partition points since they are stable and covering most visual parts of original shapes. Nevertheless, as we discussed previously, it heavily relies on skeleton pruning.

The main goal of this paper is to present a method that extracts the exact contour partition points and its related robust descriptors for shape matching. Our proposed method is easy to implement and can be computed efficiently. To achieve this, we propose a novel method for generating contour partition points which is robust and stable for shape deformation. Based on these points, we generate point context and contour segments as shape descriptor. For object matching, we apply the point context-based matching and the contour segments-based matching independently. By fusing two matching results with an integrated Gradient Hill Climbing [7] and Simulated Annealing [8] method, we calculate the shape distance and apply it for a shape retrieval scenario. Finally, we employ a modified mutual kNN graph on the distance matrix to improve the accuracy of shape retrieval. The advantages of our proposed approach include (i) low computational complexity as well as (ii) the inclusion of only a limited number of sampling points instead of all contour points with point context, and (iii) the impressive robustness of the method in an object retrieval scenario.

2 Related Work

Contour-Based Shape Matching: For contour-based approaches, many methods achieve state-of-the-art performance by only utilising contour information. For example, Shotton et al. [9] present a categorical object detection scheme that uses only local contour-based features and is realised in a partly supervised learning framework. Nguyen et al. [10] propose a shape-based local binary descriptor for object detection that has been tested in the task of detecting humans from static images. In [11], an algorithm for partial shape matching with mildly non-rigid deformations using Markov chains and the Monte Carlo method is introduced. Based on contour points, Belongie et al. proposed shape context [12] descriptor for object matching. Shape context appears to be one of the best performing shape descriptor and definitely the most popular one. With the same basic geometric units of shape context, Ma and Latecki [6] proposed a shape descriptor that is particularly suitable for shape matching of edge fragments in images to model contours of target objects. Inspired by the Bag of Words model, Wang et al. [13] introduced the Bag of contour fragments as a higher level shape descriptor. Combining it with an linear SVM the authors were able to achieve impressive results on existing Datasets.

Skeleton-Based Shape Matching: Though skeleton (or medial axis) can be generated by shape contour with maximal disk model [14], comparing to contour matching methods, skeleton matching approaches feature lower sensitivity to occlusion, limb growth and articulation [15]. However, they are computationally more complex [16] and still have not yet been fully successfully applied to real images. Baseski et al. [17] present a tree-edit-based shape matching method that uses a recent coarse skeleton representation. Their dissimilarity measure gives a better understanding within group versus between group separation which mimics the asymmetric nature of human similarity judgements. To the best of our knowledge, the best performing skeleton-based object matching algorithm has been proposed by Bai et al. [18]. Their main idea is to match skeleton graphs by comparing the geodesic paths between skeleton endpoints.

Shape Matching by Skeletons and Contours: Besides the utilisation of only contour or skeleton features for shape matching, some fused shape descriptors with contours and skeletons have been proposed. Bai et al. [5] combine the object contour and skeleton properties for shape classification. They extract contour segments using the Discrete Curve Evolution [4]. However, their approach works in a supervised pattern recognition mode and multiple training examples of an object class are necessary for modelling. Zeng et al. [19] combine properties of skeletons and boundaries for general shape decomposition. Unfortunately, this method is rotation variant and highly sensitive to shape deformation. Yang et al.  [2] proposed an approach for object retrieval that uses contour segment and skeleton matching for shape similarity computation. They employ skeleton endpoints for contour partition. For contour segment, they proposed a new feature extraction technique for contour segments. Though this method achieved excellent results, however, it still highly relies on skeleton pruning.

3 Shape Representation

Partition Point:Here we describe the contour partitioning approach employed for separating the shape of an arbitrary object \(\varOmega \) into a set of sub-parts. This subset as well as the partition points \(\varvec{p}_i\) are required to perform the matching described below. The general idea is to detect contour regions with a high curvature toward the overall shape trend. Therefore, either a single or multiple reference points (\(\varvec{x}_i\)) inside the shape are selected to compute distances between all contour points \(\partial \varOmega \) and their closest \(\varvec{x}_i\). By sampling the contour clockwise and arranging these values linearly, a signal \(\varvec{s}\) is generated that can be used to detect peaks in its second-order derivative. By backtracking these peaks, the desired partition points are obtained.

Therefore, first the contour is extracted and subsequently converted to a polygon \(\mathcal {P}\). For the purpose of polygonisation the well-known Douglas-Peucker technique is recursively applied to \(\partial \varOmega \). Here, only a small tolerance value \(\epsilon ^{(DP)}\) is taken which drastically reduces the noise-to-signal ratio without removing significant characteristics of the shape. Figure 1 illustrates this on the example image of the bird by using two different tolerance values.

Fig. 1.
figure 1

The original image and two versions created by different tolerance values.

Next, the reference points \(\varvec{x}_i\) have to be located inside the shape in order to sample the contour. They are detected by utilising a high accurate fast marching method proposed by the authors Hassouna et al. in [20]. The result is a distance map whose values are generated based on a given initialisation (seed point(s)). In this context, \(\mathcal {P}\) is taken as initialisation with the result of a distance or rather a time-crossing map \(T\). The values in \(T\) can also be interpreted as contour contraction or expansion, respectively. This is illustrated in Fig. 2.

Fig. 2.
figure 2

The figures shows the time-crossing maps of a bone, a person and a bird. They can also be interpreted as distance maps initialised with the objects’ contours.

The actual reference points are then discovered at maximum value locations in \(T\). Therefore, a detection procedure is localising iteratively candidate regions and validates them in terms of being a new reference point. This is done based on a dynamically adapting threshold \(\mu ^{(BG)}\), that is applied to \(T\) in order to remove background values to emphasise these candidate regions:

$$\begin{aligned} \small \mu ^{(BG)} = \phi (T(\varOmega )) - 2\cdot \psi (T(\varOmega )), \end{aligned}$$
(1)

where \(\phi (\cdot )\) is a function returning the maximum value with \(\{v^\star \; | \; v^\star \in T, \forall v \in T\, : \, v^\star > v\}\) inside \(\varOmega \). In contrast to this, \(\psi (\cdot )\) indicates the standard derivation. Once the background pixels are removed, the remaining ones are clustered to disjoint regions \(\varvec{A}_i\) which are subsequently used to determine weighted centroids, respectively. The first iteration (\(t=0\)) is then completed with storing all \(\varvec{A}^{t=0}_i\) inside a binary mask \(\varvec{B}\) having the same size as the input image \(\varvec{I}\) and the weighted centroids are collected in a set, depicted by \(\mathcal {C}^{t=0}\). The next iterations are processing the same way, but differ as follows: In each further iteration \(t\), the algorithm loops only over the reference points found previously (\(\mathcal {C}^{t-1}\)), where each of them is used as seed point for the fast marching method. The resulting \(T^t_i\) (where \(i=[1,2,\ldots ,\otimes (\mathcal {C}^{t-1})]\) with \(\otimes (\cdot )\) being a function that returns the number of elements) is then applied (here multiplied) to \(T^{t=0}\). The effect of \(T^{t} \circ T^{t=0}\) is illustrated in Fig. 3 based on the example of a glass. As shown, the method would not return a reference point for the lower part by only using \(T^{t=0}\).

Fig. 3.
figure 3

The figure illustrates the working principle to detect reference points iteratively. As shown, the lower point inside the glass would not be found by only using \(T^{t=0}\).

On the basis \(T^{t} \circ T^{t=0}\) the threshold \(\mu ^{(BG)}\) is adapted and new candidate regions \(\varvec{A}^{t}_i\) are located. Subsequently, the centroids (\(\varvec{x}^{t}_i\)) are calculated for all \(\varvec{A}^{t}_i\). If one of these \(\varvec{x}^{t}_i\) is residing in an already processed area, which means \(\varvec{B}(\varvec{x}^{t}_i)=1\), the point is rejected, otherwise \(\varvec{x}^{t}_i \in \mathcal {C}^{t}\) and \(\varvec{B}\) is updated. It terminates if the current iteration returns an empty set of newly detected reference points (\(\mathcal {C}^t = \emptyset \)). The final set, in turn, is generated by \(\varvec{P} = \bigcup ^{t}_{j=0} \; \mathcal {C}^{t=j}\). Finally, all reference points have to pass two further constraints. First, none of the reference points are allowed to be too close to the centroid of the shape, which is ensured with the threshold \(\mu ^{(cog)}\). If only one \(\varvec{x}_i\) is violating this constraint, all reference points are discarded except of the responsible one. Second, in presence of multiple reference points, the contour has to be split into sub-parts. During that separation the algorithm monitors that each \(\varvec{x}_i\) has only one assignment. If two contour parts get assigned to one \(\varvec{x}_i\), the point will be removed. Once the validation is completed, the fast marching approach has to be invoked a last time initialised by the final set of reference points. From the resulting time-crossing map \(T^{(final)}\) only the values corresponding to the object’s contour are extracted. Afterwards, they are arranged in a vector \(\varvec{s}\) where the number of elements equals the number of pixels belonging to \(\partial \varOmega \). Actually, this data is responsible for the final contour partitioning. For this purpose, \(\varvec{s}\) is regarded as 1D signal in the following content. The goal is now to derive the second-order derivative of \(\varvec{s}\) to determine maximum and minimum points. By backtracking these turning points, contour pixel can be selected as partition points \(\varvec{p}_i\) for the object’s contour. Figure 4 shows both, the \(T^{(final)}\) values corresponding to the contour as well as their signal plots. Since the bone structure yields two reference points, the contour has to be separated into two parts, which are shown in the first and the second column.

Fig. 4.
figure 4

The figure shows the 1D signal that has been derived from the contour depicted on the image above. The colours on the contour line encode the distance from a pixel to its closest reference point.

It is obvious that the noise-to-signal ratio exacerbates the peak detection with the result of an imprecise and noisy contour partitioning set. Thus, all subsequent matching methods are faced with unnecessarily challenging input data. In order to alleviate artifacts and to increase the robustness of the proposed approach, the signal is smoothed at first. Therefore, the Fast Fourier Transform (FFT) is employed with the intention to implement a low pass filter (LPF). The FFT is applied, since it is capable to transform a periodic signal that is defined in the time or spatial domain toward its frequency spectrum as vice-versa. Therefore, the signal is approximated as a sum of sine and cosine functions, where the coefficients describe the power of the corresponding frequency in the input signal. Since noise (and only detail information) usually manifests in high frequencies, thus, one can easily suppress noisy information without losing characteristic properties of the signal. This is shown in Fig. 5, where the signals of Fig. 4 are plotted once again, but this time after applying a LPF to them. Here, the LPF is established by a Gaussian filter that is directly applied to the zero-shifted frequency domain.

Fig. 5.
figure 5

From left to right, one can observe the smoothed signal of the lower left, upper right part of the bone as well as of the person. These smoothed versions are achieved by applying all LPF established in the frequency domain of the signals. It has to be noted, that the signals have been padded at start and end to alleviate border artifacts.

Two things have to be noted here: First, before the signal is passed to the LPF, it has to be padded at the start and at the end to alleviate border artifacts. Second, the padding process reacts in different ways depending on the number of reference points. This means, in the case of only a single reference, the signal is doubled at its start and its end, since the contour was not separated and thus forms a circle. If there are more than one \(\varvec{x}_i\) (with \(i>1\)), the contour parts should not be repeated in the signal, since it would lead to a fake peak. Hence, the start and the end are only extended by flipping \(\varvec{s}\). Finally, the first (\(\hat{\varvec{s}}'\)) and the second-order (\(\hat{\varvec{s}}''\)) derivative of the smoothed (and discretised) signal \(\hat{\varvec{s}}\) is calculated to determine all zero locations in \(\hat{\varvec{s}}'\) with positive or negative curvature in \(\hat{\varvec{s}}''\). Figure 6 shows the second-order derivatives for the signal plots in Fig. 5. Please note, that the computation of the second-order derivative only considers a signed binary version of the first-order derivative (\(\text {sign}(\hat{\varvec{s}}')\), where \(\text {sign}(\cdot )\) returns \(0\) if \(\hat{s}_i' = 0\), the value \(1\) if \(\hat{s}_i' > 0\) and \(-1\) otherwise).

Fig. 6.
figure 6

Second-order derivatives that correspond to the smoothed versions of the signals shown in Fig. 4. Please note, that for the computation of the second-order derivative only a signed binary version of the first-order derivative is taken.

Theoretically, the desired partition points \(\varvec{p}_i\) can now be easily detected by localising all positions in \(\hat{\varvec{s}}''\) which are not equal to zero. Practically, this naive approach might still encompass limited amount of false-positives generated by noise. The problem is, that it is not possible to find one Gaussian filter that is fitting for all object instances caused by the varying degree of deformation. In order to tackle this problem, the second-order derivative is analysed in terms of neighbours whose distance is too close to each other. Figure 7 demonstrates such a situation, where the red circle indicates the problem. To detect such cases in \(\hat{\varvec{s}}''\), a threshold, \(\mu ^{(sdiff)}\), is applied, defined as the half of the standard derivation of all distances between adjacent peaks in \(\hat{\varvec{s}}''\).

Fig. 7.
figure 7

The figure illustrates a situation, where the sigma of the Gaussian filter has not been chosen appropriately. It is obvious, that these artifacts can be easily determined by analysing the second-order derivative. The red circles indicate the problem.

If the distance between adjacent peaks is below \(\mu ^{(sdiff)}\), the difference of their heights (taken from \(\hat{\varvec{s}}\)) is compared to a threshold \(\mu ^{(hdiff)}\) which has to be chosen close to zero. If this threshold is also greater than the height difference, the sigma of the Gaussian filter is decreased by \(0.1\) and the smoothing procedure is repeated with the original \(\varvec{s}\). Experiments showed, that it only requires one to two iterations if the algorithm starts with a sigma = 1. Finally, based on this validated set of peaks, the corresponding contour partition points are determined. Figure 8 shows the final results of the bone, the person and the bird.

Fig. 8.
figure 8

Final contour partitioning set of the bone, bird and the person.

Point Context: For each partition point \(\varvec{p}_{i}, i = 1,2,...,N\), we propose a point descriptor, Point Context, which has the same basic geometric unit as shape context  [12]. Considering the set of vectors originating from a partition point to all other points on a shape, these vectors express the configuration of the entire shape relative to the reference point. However, different from the proposed matching method from [12], we only consider the feature vectors from partition points instead of roughly uniform spacing for selecting sample points. Moreover, during object shape matching, we calculate the shape distance by corresponding matrix between pairs of partition points without any modelling transformation process. We employ a diagram of log-polar histogram bins used in computing the point context. More specifically, as has been defined in  [12], for a partition point \(\varvec{p}_{i}\) on the shape contour, we compute a coarse histogram \(p_{i}\) of the relative coordinates of the remaining contour points,

$$\begin{aligned} \small p_{i,m} = \# \lbrace \varvec{q} \ne \varvec{p}_{i}:(\varvec{q}-\varvec{p}_{i}) \in bin(m) \rbrace , \end{aligned}$$
(2)

where \(\#\{\cdot \}\) is the function that extracts feature values for log-polar histogram and \(\varvec{q} = \partial \varOmega \setminus \{\varvec{p}_{i}\} \). We use five bins for \(\log {r}\) and 12 bins for \(\theta \). This histogram is defined to be the point context of \(\varvec{p}_{i}\). To achieve scale invariance we normalise all radial distances by the mean distance between the contour point pairs in the shape. Thereafter, for each partition point \(\varvec{p}_{i}\), it can be represented as a 60 dimension feature vector:

$$\begin{aligned} \small \varvec{p}_{i} = {(p_{i,1}, p_{i,2}, \ldots , p_{i, 60})}^{\text {T}}. \end{aligned}$$
(3)

The whole partition points can be represented as a set of feature vectors describing its points context:

$$\begin{aligned} \small \varvec{P} = \{\varvec{p}_{1}, \varvec{p}_{2}, \ldots , \varvec{p}_i \ldots , \varvec{p}_{N} \}. \end{aligned}$$
(4)

Contour Segment: Based on partition points, the object contour is divided into \(N\) Contour Segments (CS). For each CS a 10-dimensional meaningful feature vector \(\varvec{c}_{n}^\prime \) is extracted, whereas its first element is equal to Euclidean distance of contour segment two partition points \(c_{n,1}^\prime \). The total number of pixels in a CS determines \(c_{n,2}^\prime \). These two features are able to express how much a CS differs from a straight line. In order to distinguish contour segments of the type presented in [2], the area between the straight line connecting the CS endpoints and the contour segment itself is used as the third feature (\(c_{n,3}^\prime \)). Moreover, the ratio between \(c_{n,1}^\prime \) and \(c_{n,2}^\prime \) is applied as \(c_{n,4}^\prime \) to evaluate the path variation.

Before computing remaining features, each CS is transformed into a normalised vertical orientation (i.e., so that its endpoints are vertically aligned) to ensure rotation invariance of the object contour representation (see Fig. 9). From the two possible results of such a normalising transformation, the CS with the majority of points lying on the right side of the straight line connecting its endpoints is selected for further processing. For computing further features \(c_{n,5}^\prime , c_{n,6}^\prime , \ldots , c_{n,10}^\prime \), we use the bounding box of the whole CS as well as the three equally high sub-boxes shown in Fig. 9:

$$\begin{aligned} \begin{array}{lll} c_{n,5}^\prime = \frac{h_{n}}{w_{n}} &{} \quad \quad c_{n,6}^\prime = \frac{h_{n,1}}{w_{n,1}} &{} \quad \quad c_{n,7}^\prime = \frac{h_{n,2}}{w_{n,2}} \\ c_{n,8}^\prime = \frac{h_{n,3}}{w_{n,3}} &{} \quad \quad c_{n,9}^\prime = \frac{w_{n,3} h_{n,3}}{w_{n,1} h_{n,1}} &{} \quad \quad c_{n,10}^\prime = \frac{w_{n,2} h_{n,2}}{w_{n,1} h_{n,1}} \\ \end{array} . \end{aligned}$$
(5)
Fig. 9.
figure 9

CS bounding box and equally high sub-boxes (\(h_{n,1}=h_{n,2}=h_{n,3}\)) used for feature extraction.

Finally, we divide the elements of the feature vector by a half of the bounding box perimeter for scale invariance:

$$\begin{aligned} \small \varvec{c}_{n} = \frac{\varvec{c_n}^\prime }{w_n + h_n} = {(c_{n,1}, \ldots , c_{n, 10})}^{\text {T}} \end{aligned}$$
(6)

The object contour can now be represented as a set of feature vectors describing its contour segments:

$$\begin{aligned} \small \varvec{C} = \{\varvec{c}_{1}, \varvec{c}_{2}, \ldots , \varvec{c}_n \ldots , \varvec{c}_{N} \} . \end{aligned}$$
(7)

Finally, we fuse the partition point vectors and the contour segment vectors as the descriptor for the whole object shape contour \(\partial \varOmega \):

$$\begin{aligned} \small \partial \varOmega = \{ \varvec{P}, \varvec{C} \} . \end{aligned}$$
(8)

4 Shape Matching

The matching of two objects including the computation of their similarity is performed separately for their partition points and contour segments representations. After matching, we estimated shape distances as the weighted sum of two terms: partition points dissimilarity, contour segments dissimilarity. For query-by-example shape retrieval, we built our approach by considering the underlying structure of the shape manifold, which is estimated from the shape distance scores between all the shapes within a database.

Point Context-Based Matching: Let \(\partial \varOmega \) and \(\partial \varOmega ^\prime \) denote two shape contours to be matched, and let \(\varvec{p}_{n}\) and \(\varvec{p}^\prime _{k}\) be some partition points in \(\varvec{P}\) and \(\varvec{P}^\prime \), respectively. Let the numbers of the partition points in \(\partial \varOmega \) and \(\partial \varOmega ^\prime \) be \(N\) and \(K\), respectively, and \(N \le K\). As point contexts are distributions represented as histograms, the dissimilarity \(d(\varvec{p}_{n},\varvec{p}^\prime _{k})\) between \(\varvec{p}_{n}\) and \(\varvec{p}^\prime _{k}\) is estimated based on the \(\chi ^{2}\) test statistic:

$$\begin{aligned} \small \begin{array}{l} d(\varvec{p}_{n},\varvec{p}^\prime _{k}) = \tfrac{1}{2}\sum \limits _{m=1}^{M}\tfrac{[p_{n,m}-p_{k,m}]^{2}}{p_{n,m}+p_{k,m}} \end{array} . \end{aligned}$$
(9)

where \(M=60\) is the dimensionality of the feature space. \(h_{n}(m)\) and \(h_{k}(m)\) denote the \(M\)-bin normalised histogram at \(\varvec{p}_{n}\) and \(\varvec{p}^\prime _{k}\), respectively. For two shapes contour, \(\partial \varOmega \) and \(\partial \varOmega ^\prime \), we use the lists of feature vectors for the clockwise ordered partition points. We compute all the dissimilarity between partition points and obtain a matrix:

$$\begin{aligned} \small {D(\varvec{P},\varvec{P}^\prime )} = \begin{pmatrix} d(\varvec{p}_1, \varvec{p}^\prime _1) &{} d(\varvec{p}_1, \varvec{p}^\prime _2) &{} \ldots &{} d(\varvec{p}_1, \varvec{p}^\prime _K)\\ d(\varvec{p}_2, \varvec{p}^\prime _1) &{} d(\varvec{p}_2, \varvec{p}^\prime _2) &{} \ldots &{} d(\varvec{p}_2, \varvec{p}^\prime _K)\\ \vdots &{} \vdots &{} \vdots \\ d(\varvec{p}_N, \varvec{p}^\prime _1) &{} d(\varvec{p}_N, \varvec{p}^\prime _2) &{} \ldots &{} d(\varvec{p}_N, \varvec{p}^\prime _K)\\ \end{pmatrix} . \end{aligned}$$
(10)

Based on (10), we find the best matched points from \(\varvec{P}\) to \(\varvec{P}^\prime \) by Hungarian algorithm [12]. For each partition point \(\varvec{p}_{i}\) in \(\varvec{P}\), the Hungarian algorithm can find its corresponding partition point \(\varvec{p}^\prime _{j}\) in \(\varvec{P}^\prime \). Since \(\partial \varOmega \) and \(\partial \varOmega ^\prime \) may have different numbers of partition points, the total dissimilarity value should include the penalty for end nodes that didn’t find any partner. To achieve this, we simply add additional rows with a constant value const to (10) so that \(\varvec{D}(\varvec{P},\varvec{P}^\prime )\) becomes a square matrix. The constant value const is the average of all the other values in \(\varvec{D}(\varvec{P},\varvec{P}^\prime )\).

The resulting dissimilarity values of the matched partition points can be denoted as \(d_1, d_2, \ldots , d_N\) and the global partition point-based dissimilarity between \(\varvec{P}\) and \(\varvec{P}^\prime \) is calculated as follows:

$$\begin{aligned} \small d_\mathrm{points}(\varvec{P}, \varvec{P}^\prime ) = \frac{1}{N} \sum \limits _{n=1}^{N}d_n . \end{aligned}$$
(11)

Contour Segment-Based Matching: Similar to point context-based matching, Let \(\partial \varOmega \) and \(\partial \varOmega ^\prime \) denote two shape contours to be matched, and let \(\varvec{c}_{n}\) and \(\varvec{c}^\prime _{k}\) be some contour segments in \(\varvec{C}\) and \(\varvec{C}^\prime \), respectively. The numbers of contour segments is equal to the number of partition points in \(\partial \varOmega \) and \(\partial \varOmega ^\prime \), be \(N\) and \(K\), respectively, and \(N \le K\).

We introduce a matching cost measure for contour segments \(\varvec{c}_{n}\) and \(\varvec{c}_{k}^\prime \) belonging to different shape contour \(\partial \varOmega \) and \(\partial \varOmega ^\prime \):

$$\begin{aligned} \small d(\varvec{c}_{n},\varvec{c}_{k}^\prime ) = \frac{1}{M} \sum \limits _{m=1}^{M} \frac{\sigma _m \vert c_{n,m}-c_{k,m}^{\prime }\vert }{\sum \limits _{j=1}^{K}\vert c_{n,m}-c_{k,m}^{\prime }\vert } , \end{aligned}$$
(12)

where \(M=10\) is the dimensionality of the feature space and \(\sigma _m\) is the weight for each feature achieved in an optimisation process explained in [2]. As one can see, the dissimilarity value between \(\varvec{c}_{n}\) and \(\varvec{c}_{k}^{\prime }\) does not only depend on these two contour segments. All CS of \(\varvec{C}\) are taken into consideration, whereby (12) does not fulfil the symmetry property \(d(\varvec{c}_{n},\varvec{c}_{k}^{\prime }) \ne d(\varvec{c}_{k}^{\prime },\varvec{c}_{n})\). However, it behaves in accordance to human perception. If the matching cost of \(\varvec{c}_{n}\) to the neighbours of \(\varvec{c}_{k}^{\prime }\) in \(\varvec{C^\prime }\) decreases, \(d(\varvec{c}_{n},\varvec{c}_{k}^{\prime })\) increases.

We arrange the contour segments of both objects in a clockwise way so that the objects can be represented by ordered lists of feature vectors:

$$\begin{aligned} \small \begin{array}{l} \varvec{C} = (\varvec{c}_{1}, \varvec{c}_{2}, \ldots , \varvec{c}_{n}, \ldots , \varvec{c}_{N}) \\ \varvec{C}^\prime = (\varvec{c}^\prime _{1}, \varvec{c}^\prime _{2}, \ldots , \varvec{c}^\prime _{k}, \ldots , \varvec{c}^\prime _{K}) \\ \end{array} . \end{aligned}$$
(13)

Using (12) we generate a matrix of matching cost between all CS in \(\varvec{C}\) and in \(\varvec{C}^\prime \):

$$\begin{aligned} \small {D(\varvec{C},\varvec{C}^\prime )} = \begin{pmatrix} d(\varvec{c}_1, \varvec{c}^\prime _1) &{} d(\varvec{c}_1, \varvec{c}^\prime _2) &{} \ldots &{} d(\varvec{c}_1, \varvec{c}^\prime _K)\\ d(\varvec{c}_2, \varvec{c}^\prime _1) &{} d(\varvec{c}_2, \varvec{c}^\prime _2) &{} \ldots &{} d(\varvec{c}_2, \varvec{c}^\prime _K)\\ \vdots &{} \vdots &{} \vdots \\ d(\varvec{c}_N, \varvec{c}^\prime _1) &{} d(\varvec{c}_N, \varvec{c}^\prime _2) &{} \ldots &{} d(\varvec{c}_N, \varvec{c}^\prime _K)\\ \end{pmatrix} . \end{aligned}$$
(14)

The next steps are the same as partition points. In order to find an optimum match of contour segments from \(\varvec{C}\) to CS from \(\varvec{C}^\prime \), we apply the Hungarian algorithm for the matrix expressed in (14). Then we can find the resulting dissimilarity values of the matched contour segments denoted as \(d_1, d_2, \ldots , d_N\) and the global dissimilarity \(d_\mathrm{contours}(\varvec{C}, \varvec{C}^\prime )\) is calculated by the same method in (11).

Shape Distance and Retrieval: In this paper, we defined the distance between two shapes with a fused scheme between partition points and contour segments:

$$\begin{aligned} \small d(\varOmega ,\varOmega ^\prime ) = \alpha d_\mathrm{points}(\varvec{P}, \varvec{P}^\prime ) + (1-\alpha ) d_\mathrm{contours}(\varvec{C}, \varvec{C}^\prime ) . \end{aligned}$$
(15)

The weight \(\alpha \) in (15) has been optimised by an integrated Gradient Hill Climbing [7] and Simulated Annealing [8] method on the Kimia216 dataset. By experiment, the optimised weight is \(\alpha = 0.7\) and we employ it for all experiments in Sect. 5.

For the task of shape retrieval, given a set of \(H\) shapes, a shape matching algorithm (15) can be applied to obtain a \(H \times H\) distance matrix to describe the pairwise relations between all shapes in terms of a dissimilarity measure. Such an approach ignores the fact that also distances to all other shapes contain important information about the overall shape manifold. Therefore, in this paper, we employ a method proposed by [21] that allows to improve retrieval results by analysing the underlying structure of the shape manifold. This method captures the manifold structure of the data by defining a neighbourhood for each data point in terms of a modified version of a mutual kNN graph which yields improved performance on all experiment databases in Sect. 5.

Fig. 10.
figure 10

Example shapes from the experimental datasets: Kimia-99 (first row), MPEG-400 (second row), Kimia-216 (third row) and EM-200 (fourth row with original, fifth row with manually- and the sixth row with semi-automatically segmented images).

5 Experiments and Results

Correspondence Matching: In Fig. 11 (left), we match a dog to another dog with legs and backside deformation. Since the articulations of the query dog is similar to the deformed one, our matching process finds the correct correspondences between partition points as well as contour segments. Figure 11 (right) shows the correspondence between two tools. Based on human perception there are two mismatched partition points. This is due to their symmetric silhouette and there are many similar contour segments and partition points in one object which could affect the performance of matching approach. However, in terms of the geometry, the correspondences are correct since the matched partition points and contour segments have the same geometrical location among their objects. Therefore, the final similarity between two objects is not affected. Figure 12 are taken from Kimia-99’s person dataset, whereas the silhouette on the right side is manually modified. It demonstrates that the proposed method is able to establish a correct correspondence if parts are substantially deformed. The obtained correspondence in Kimia-99 [22] illustrates that our matching process has strong performance in the presence of occlusion.

Fig. 11.
figure 11

The corresponding partition points (linked with lines) and contour segments (linked with colours) between two dogs (left) and tools (right) from Kimia99 dataset.

Fig. 12.
figure 12

Example correspondence between articulating persons.

Application Independent Experiments: We tested our algorithm on two shape databases provided by Kimia-216 [18] (the fourth row in Fig. 10) and MPEG-400 [2] (the third row in Fig. 10). We compare our method to the Path Similarity Skeleton Graph Matching (PSSGM) [18] and the Contour Segments-based Matching [2] in which contour segments are generated by the endpoints of skeletons. To compare our partition point extraction technique against existing methods, we also tested our algorithm with partition points gained from Discrete Curve Evolution [23]. To get the best results we set the stop parameter at 4. For each of the shapes used as a query, we have checked whether the retrieved results are correct, i.e., belong to the same class as the query. In order to enable quantitative comparison, we have kept the experimental convention proposed in [18] and considered the 10 best matches for each query. Results achieved for the Kimia-216 and the MPEG-400 datasets can be found in Table 1, whereas in [24] the PSSGM algorithm had been re-evaluated without any preliminary assumptions regarding the object skeletonisation. For these two databases, it is important to observe that the fused point context and contour segment outperforms skeleton-based methods [18] and contour segments-based methods [2] while the complexity of feature generation and matching has been reduced.

Environmental Microorganism Retrieval: EMs and their species are very important indicators to evaluate environmental quality, but their manual recognition is very time-consuming [25]. Thus, automatic analysis techniques for microscopic images of EMs would be very appreciated by environmental scientists. We have tested our methodology for this application using a real dataset acquired in environmental laboratories of the University of Science and Technology Beijing. We segmented the images manually and semi-automatically with the method introduced in [26]. Examples of EM original images as well as manually segmented and semi-automatically segmented EM images can be seen in Fig. 10. The impressive results for the EM-200 [27] dataset (see Table 1) confirm the high descriptive power of our new feature extraction technique for contours and prove the applicability of our shape retrieval algorithm to real-world applications.

Table 1. Experimental comparison of our methodology to the most related algorithm using the MPEG-400, Kimia216 datasets as well as the proof of applicability of our approach to real world problems using the EM-200 dataset. Results are summarised as the number of shapes from the same class among the first top 1–10 shapes.

6 Conclusion and Future Work

In this paper, we proposed a method for 2D shape similarity measure based on contour segment matching and, after fusion with a partition point matching, use it for object retrieval. The most innovative part of our approach is the robust and stable partition point generation and the comparison and matching of contour segments. The algorithm can be easily implemented to real applications. In the future, we will investigate possibilities of weighting properties on contour segment feature, which could be applied for dataset adopting with a supervised optimisation scheme.