Keywords

1 Introduction

Shape is considered as an important kind of visual features in image analysis and computer vision. Shape description is a key issue for shape retrieval, which has been successfully applied to solve many tasks such as image retrieval [1,2,3,4], face recognition [5] and 3D model reconstruction [6,7,8].

Existing shape descriptors can be roughly divided into two main categories as follows: the global descriptors and the local descriptors. Typical global descriptors (such as the shape context (SC) [9] and the inner-distance shape context (IDSC) [10]) describe the relative spatial distribution of the feature points by the information of other points. They are naturally robust to local deformation, but fail to capture local details. On the other hand, local descriptors are precise to represent local shape features. For that, an effective way is to decompose the input shape into parts via various strategies, such as the hierarchical procrustes matching (HPM) [11], the shape tree [12], and the hierarchical string cut (HSC) [1]. However, local descriptors suffer from strong noise and local deformation.

To overcome the shortcoming caused by noise and intra-class variations, the concept of multiple scale for shape description is proposed. The curvature scale space (CSS) [13] exploits the Gaussian kernel to produce a multiscale shape representation for shape retrieval. Alajlan et al. [14] propose the triangle area (TAR) which describes convexity/concavity of each contour point using the signed areas of triangles formed by boundary points at different scales. Yang et al. [15] define three invariant multi-scale features to represent the shape. Zhang et al. [16] propose a multiscale ellipse descriptor (MED) method where both spatial location and topology structure are used to extract the coarse-to-fine shape details. A small scale has strong shape descriptive ability to local details while a large scale has stable shape features to overcome the drawbacks of noise and local deformation. The underlying idea of a multiscale shape descriptor is to obtain more shape information at different scales. Besides, some algorithms based on deep features have appeared for shape retrieval, e.g. CNN [26] and DeepGM [27]. Oliveira et al. [28] create a complex network with boundary points of 2D shapes, and the dynamic of the network is analyzed by means of the spectral graph theory. However, it is still a challenging problem to match shapes with strong noise, intra-class variation and irregular deformation simultaneously.

Fig. 1.
figure 1

Effect of using morphological operations to handle irregular shape deformations. (a, c): the input shapes; (b, d): the corresponding shapes of (a, c) after using morphological operations.

The intra-class variation indicates the varying geometric transformations of shapes, including rotation, scaling, affine transformation, etc. The shapes with noise or intra-class variations can be easily classified to the same class by human perception, while they are certainly difficult for shape retrieval algorithms. The irregular deformation refers to dramatic gap inside shapes like Fig. 1(a, c), which causes too many interference points and increases the shape dissimilarity of the same class. Therefore, it is critical to design an effective, discriminative and robust method for shape retrieval.

Fig. 2.
figure 2

Pipeline of our proposed method: first, a multiscale description of the input shape is produced by extracting its height features in the fused scale-space; then, a multiscale integration strategy is used for shape retrieval.

To satisfy all the challenges above, it is desirable to extract robust shape features to deal with strong noise. Besides, considering the advantages of global and local shape descriptors, a straightforward idea is to combine the global and local shape features for an exhaustive shape representation in different feature scales, which should be robust to intra-class variations and irregular deformation.

In this paper, a novel fused scale-space description is proposed for shape retrieval as shown in Fig. 2. Based on height function method [20], we extend the sequence feature derived from the shape. Then morphological operations and Gaussian smoothing are jointly used to produce a fused scale space, which obtains the multiscale shape information. The morphological scale space can well deal with intra-class and irregular deformations in accord with human perception, while the Gaussian scale space is robust to noise. Finally, we propose a new integration strategy of the ultimate shape similarity in multiple scales to obtain a qualitative matching result.

The main contribution of this paper is two folds. First, the fused scale-space description is proposed to overcome the problems result from different transformations simultaneously in Sect. 2. The second is that a scale-space integration strategy for shape retrieval is proposed in Sect. 3, which combines the individual retrieval results produced in different scales to generate the final output. Extensive experiments are carried out to show the effectiveness and robustness of the proposed method.

2 Shape Description in a Fused Scale Space

When objects are retrieved based on shape features, the retrieval results would be easily influenced due to our perceptual customs. It is well known that the Gaussian smoothing can effectively remove noise along the contour, but it is not able to handle irregular deformations properly. Therefore, the Gaussian smoothing cannot be used alone to simulate human perception. Moreover, morphological operations can capture the main structure of shape well. Inspired by the work of Hu et al. [17], we combine the advantages of the two operations and jointly exploit them to produce a scale-space description of the input shape for getting more comprehensive features.

2.1 The Morphological Scale Space

Morphological operations include the dilation and the erosion, which are realized by using a structuring element (SE) to modify the shape according to two specific rules, respectively [18]. Especially, opening operation and closing operation are defined by different sequence with the same structuring element of dilation and erosion operators. In our method, morphological scale space (MSS) is obtained by closing (dilation+erosion) the binary shape with operators of increasing size (1). The closing operation is defined as:

$$\begin{aligned} M\left( \varsigma ,x,y \right) =B\left( x,y \right) \bullet f\left( \varsigma ,x,y \right) , \end{aligned}$$
(1)

where the \(\bullet \) operator denotes the morphological closing operation applied to the binary shape \(B\left( x,y \right) \). The structuring element \(f\left( \cdot ,\cdot \right) \) is parameterised by size \(\varsigma \). At each MSS level, \(\varsigma \) is increased such that the closing operation affects a large region, which can be regarded as the scale parameter. In our experiments, \(\varsigma \) is \(m\cdot 5\) pixels, where m is the MSS level starting from 0.

From Fig. 1, we know that this operation can well handle shapes with irregular deformations and preserve the main structure to conform human visual perception by \(\varsigma =20\). In more detail, we first use the Matlab function ‘strel’ to create flat disk-shaped structuring elements by using different \(\varsigma \) which are invariant to shape rotation. Then, a sequence of shapes are generated by adjusting different \(\varsigma \) under different morphological scales.

Fig. 3.
figure 3

Effect of using Gaussian smoothing to handle contour noise. The first row presents three shapes taken from the MPEG-7 database, where the first two belong to the same class, and the third one belongs to another shape class. The second row presents the smoothed versions of the input shapes correspondingly. In each row, the Euclidean distances between the last two shapes and the first shape (used as a reference) are computed and marked at their right bottom corners, respectively.

2.2 The Gaussian Scale Space

The Gaussian is a conventional kernel for producing a multiscale shape representation [19], by which noise and insignificant shape features can be effectively suppressed along the shape contour. Denote the considered shape contour as:

$$\begin{aligned} C= \left( x\left( u \right) ,y\left( u \right) \right) , \end{aligned}$$
(2)

where u is arc length parameter normalized by contour length. The one-dimensional Gaussian filter is expressed as:

$$\begin{aligned} g\left( u,\sigma \right) =\frac{1}{\sigma \sqrt{2\pi }}\exp \left( -\frac{u^{2}}{2\sigma ^{2}} \right) , \end{aligned}$$
(3)

where \(\sigma \) is the width of Gaussian kernel and regarded as the scale parameter. Let \(X\left( u,\sigma \right) \) and \(Y\left( u,\sigma \right) \) be the coordinate functions of the contour curve at the scale \(\sigma \), which are produced by the convolution operation as follows:

$$\begin{aligned} X\left( u,\sigma \right) =x(u)*g(u,\sigma ), \end{aligned}$$
(4)
$$\begin{aligned} Y\left( u,\sigma \right) =y(u)*g(u,\sigma ). \end{aligned}$$
(5)
Fig. 4.
figure 4

The proposed scale-space shape description under different scale parameters: (a) the evolved versions of the input shape under the scale \(\left\{ \varsigma _{0} ,\sigma _{0} \right\} =\left\{ 0,0 \right\} \), \(\left\{ \varsigma _{1} ,\sigma _{1} \right\} =\left\{ 5,8 \right\} \) and \(\left\{ \varsigma _{2} ,\sigma _{2} \right\} =\left\{ 10,16 \right\} \), respectively; (b) the height values of the three evolved versions.

Figure 3 shows the visual effect of Gaussian smoothing and similarity calculations with shapes of the same or different class. It is clear that Gaussian smoothing has good effect when facing shapes with strong noise along the shape contour. As shown in Fig. 3, the first row shows three shapes taken from the MPEG-7 dataset, where second shape is of the same class as the first one, while the third shape belongs to a different class. However, when referred with the first shape (a), the third shape (c) has a closer Euclidean distance than that of the second shape (b) (the distance values are marked on the right bottom corner of the two shapes, respectively). This will lead to an incorrect retrieval result. The second row shows the smoothed versions of the three shapes correspondingly, where the miss-retrieved problem has been solved as the second shape (e) becomes more closer than the third shape (f) to the first one (d). It satisfies the basic principle of minimizing intra-class distance and maximizing distance between different classes in pattern recognition.

2.3 The Fused Scale Space

To jointly exploit the advantages of morphological operations and Gaussian smoothing, we propose the fused scale-space shape description which is generated by using the two operations together with different \(\varsigma \) and \(\sigma \), that is, \(\left\{ \varsigma ,\sigma \right\} \). The effect of this description is demonstrated in Fig. 4.

From Fig. 4, it is seen that the device shape can preserve better structure at \(\left\{ \varsigma ,\sigma \right\} =\left\{ 0,0 \right\} \), while \(\left\{ \varsigma ,\sigma \right\} =\left\{ 10,16 \right\} \) are suitable for human perception. In Fig. 4(b), the height function in our paper for the same sample point shows scale-space shape description under different fused scale parameters. With more experiments, it can be concluded that shapes with small deformations need lower \(\varsigma \) and \(\sigma \), while irregular deformations should be handled with larger \(\varsigma \) and \(\sigma \). Hence, individual results yielded under different scales should be fused to handle different transformations simultaneously.

At each joint scale (consist of a \(\varsigma \) and a \(\sigma \)), the height-function shape features [20] are extracted. Figure 5 shows the schematic diagram of the height function descriptor. More specific details can refer to [20]. The feature vector of the point \(p_{i}\) is an ordered sequence of the height function:

Fig. 5.
figure 5

Height function feature descriptor.

$$\begin{aligned} \begin{aligned} H_{i}&=\left( H_{i}^{1},H_{i}^{2},...,H_{i}^{N-1} \right) ^{T}\\&=\left( H_{i,i+1},...,H_{i,N}, H_{i,1},...,H_{i,i-1}\right) ^{T}, \end{aligned} \end{aligned}$$
(6)

where \(H_{i,j}\) denotes the height value of the jth sample point \(p_{j}\) with regard to the point \(p_{i}\), calculated by:

$$\begin{aligned} H_{i,j}=\frac{\det \left( p_{i-1},p_{j},p_{i+1} \right) }{\left| p_{i-1}p_{i+1} \right| }. \end{aligned}$$
(7)

3 Scale-Space Shape Retrieval

For shape retrieval, we first use the dynamic programming to find the optimal correspondence of contour points between the query and model shape. Then, the shapes in datasets are ranked according to their matching scores measured by the Euclidean distance of corresponding points. This produces the retrieval result at each scale individually.

To conduct a scale-space shape retrieval, a key step is to fuse the individual results of different scales properly. In this work, we select n scales to conduct such an integration. At each scale, the top m most similar shapes are considered. Note that the similarity scores are inconsistent across scales because the shapes have suffered different levels of morphological operations and Gaussian smoothing at different scales. Therefore, we reset the similarity scores of the returned shapes by using a uniform criterion. In more detail, we denote the set of retrieval shapes at the scale \(S_{t}\) as \(\left\{ r_{1}^{t},r_{2}^{t},...,r_{m}^{t}\right\} \). For each shape in the \(S_{t}\), a new similarity score is assigned based on its order in the return shapes. The new similarity score could be defined as a non-linear function of its order; that is,

$$\begin{aligned} \text {sim}_{i}^{t}=\exp \left( - 2\times L_{i}^{t} \right) , \end{aligned}$$
(8)

where \(L_{i}^{t}\) is the ranking order of the i-th shape at the scale \(S_{t}\), \(i\in \left\{ 1,2,...,m \right\} \). The function defined above is a descending one, which describes the score of similarity between the return shape and the query shape at the same fused scale.

The final retrieval result will be generated by using a union of the returned shape sets; that is, \(U=\left\{ S_{0}, S_{1},..,S_{n}\right\} \), in which each shape has been assigned with the new similarity score. Since a database shape could have multiple response in U, the length of U might vary from m to \( m \cdot n \). To calculate the final similarity score of each shape in U, if a shape \(C_{j}\) is not a returned one at a certain scale, we assume that \(\text {sim}_{j}^{t}=0\) \(\left( j\in \left\{ 1,2,...,m \right\} \right) \).

With the above preparation, the final similarity score of each database shape in U is produced as follows:

$$\begin{aligned} F_{j}=\sum _{k=0}^{n}w_{k}\cdot \text {sim}_{j}^{k} \end{aligned}$$
(9)

where \(w_{k}\) is a weightage to determine the contribution of the retrieval result at individual scales. The retrieval results are finally obtained by ranking the shapes in U according to their similarity score \(F_{j}\) \(\left( j\in \left\{ 1,2,...,m \right\} \right) \) in a multiscale sense.

4 Experimental Results

4.1 MPEG-7 Shape Dataset

The MPEG-7 shape dataset [10, 21] has been widely used for evaluating a shape retrieval algorithm. This dataset consists of 1400 shapes, belonging to 70 classes with 20 shapes in each class. The retrieval accuracy is measured by the well-known bulls-eye score. The result of our method, comparing with other 8 state-of-the-art methods, is documented in Table 1, where our method achieves the highest accuracy.

Fig. 6.
figure 6

Top 15 retrieved shapes of the query shape “camel” at different scales. The fused result is generated by the proposed multiscale retrieval.

Table 1. Performance comparison of different methods by using Bulls-eye score on MPEG-7 dataset

To illustrate how the proposed multiscale retrieval method works, an exemplary experiment is shown in Fig. 6, where 5 joint scales are used; that is, \(\{\varsigma _{0},\sigma _{0}\}\), \(\{\varsigma _{1},\sigma _{0}\}\), \(\{\varsigma _{2},\sigma _{0}\}\), \(\{\varsigma _{0},\sigma _{1}\}\) and \(\{\varsigma _{0},\sigma _{2}\}\), with \(\varsigma \in \left\{ 0,5,10 \right\} \), \(\sigma \in \left\{ 0,8,16 \right\} \). The query shape is a “camel”, and the top 15 retrieved shapes are presented at each used scale individually. The non-linear function defined in (2) is then used to compute a new similarity score for each shape. The default value is \(m=40\) and the weightages in (3) are taken to be \(w_{0}=0.4,w_{1}=0.2,w_{2}=0.2,w_{3}=0.1\) and \(w_{4}=0.1\). In Fig. 6, the shapes marked in ellipses or boxes represent false positive retrieved shapes. One can see that our multiscale retrieval (the fused result) is clearly superior over the conventional retrieval (the result at \(\{\varsigma _{0},\sigma _{0}\}\)). It might be argued that the result at some fused scale is better than the multiscale retrieval. However, in practice there is no prior about which scale performs best, and we observe that the multiscale retrieval can perform better than any single scale in average.

4.2 Kimia Shape Datasets

The Kimia database [24] is another widely used benchmark database for shape retrieval, including Kimia-99 and Kimia-216 datasets. The Kimia-99 dataset contains 99 shapes grouped into 9 classes. The retrieval rates are summarized as the number of shapes from the same class among the top 1 to 10 most similar shapes. The best possible result of the retrieval is 99. In experiments, we set 4 joint scales, where \(\{\varsigma _{0},\sigma _{0}\}\), \(\{\varsigma _{0},\sigma _{1}\}\), \(\{\varsigma _{0},\sigma _{2}\}\) and \(\{\varsigma _{0},\sigma _{3}\}\), with \(\varsigma _{0}=0\), \(\sigma \in \left\{ 0,5,8,12 \right\} \). The weightages in (4) are taken to be \(w_{0}=0.4,w_{1}=0.3,w_{2}=0.2\) and \(w_{3}=0.1\).

Table 2. Retrieval results on the Kimia-99 dataset
Table 3. Retrieval results on the Kimia-216 dataset

The Kimia-216 dataset consists of 18 classes with 12 shapes in each class. The top 11 closest matches are selected which are in the same class as the query shape and the best result is 216. In experiments, we set 4 joint scales, where \(\{\varsigma _{0},\sigma _{0}\}\), \(\{\varsigma _{1},\sigma _{1}\}\), \(\{\varsigma _{1},\sigma _{2}\}\) and \(\{\varsigma _{1},\sigma _{3}\}\), with \(\varsigma \in \left\{ 0,5 \right\} \), \(\sigma \in \left\{ 0,5,8,10 \right\} \). The weightages in (5) are \(w_{0}=w_{1}=w_{2}=w_{3}=0.25\). The overall retrieval results comparing with other methods are shown in Tables 2 and 3.

4.3 Robustness Against Noise, Intra-class, and Irregular Transformations

To evaluate our method in the presence of noise, the shape contours are perturbed by Gaussian noise with zero mean and varying deviation. The deviation value increases from 0.2 to 0.8, and the noisy effect is demonstrated in Fig. 7. Figure 8 shows the performance of different methods on the Kimia-99 dataset, and the average retrieval result of each method is plotted against noise. It can be seen that our proposed method performs stably well and produces the best results under various intensity of noise.

Furthermore, in order to verify the effectiveness of our method, we compare the Euclidean distances of query shape and other shapes facing Gaussian noise, intra-class variation and irregular deformation simultaneously, as shown in Fig. 9. From the comparison results, we can see that the distance values are very similar even with 18 different types of transformations, which shows strong robustness of the proposed method.

Fig. 7.
figure 7

Noisy shape contours. (a) The original shape contour; (b) to (e): The noise intensity increases from 0.2 to 0.8.

Fig. 8.
figure 8

Robustness against noise on the Kimia-99 dataset.

Table 4. Runtimes on the MPEG-7 dataset
Fig. 9.
figure 9

The Euclidean distances comparison with different Gaussian noise, intra-class variation and irregular deformation simultaneously.

4.4 Runtimes of Different Methods

To verify the computational efficiency of our proposed method, we conduct experimental tests on the MPEG-7 dataset comparing with some state-of-the-art methods. Each shape in the dataset is used for retrieval, and the average calculation time of each query shape is recorded. The comparison results are shown in Table 4. It can be seen that the calculation time required with the proposed algorithm is 60 ms. Compared with the other five representative shape retrieval algorithms, it has obvious computational efficiency superiority.

5 Conclusion

In this paper, a new scale-space method is proposed for shape description and retrieval. To overcome the difficulties caused by strong noise, intra-class shape variation and irregular deformation simultaneously, morphological operations and Gaussian smoothing are jointly used, so that a fused scale space of the input shape is generated. Based on height function, shape features are extracted across scales. The retrieval results under multiple scales are fused by using an integration strategy. The experimental results on benchmark datasets are presented to validate the effectiveness and robustness of the proposed method.