Image Descriptors Based on Curvature Histograms

Fischer, Philipp; Brox, Thomas

doi:10.1007/978-3-319-11752-2_19

Philipp Fischer¹⁶ &
Thomas Brox¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8753))

Included in the following conference series:

German Conference on Pattern Recognition

2927 Accesses
8 Citations

Abstract

Descriptors based on orientation histograms are widely used in computer vision. The spatial pooling involved in these representations provides important invariance properties, yet it is also responsible for the loss of important details. In this paper, we suggest a way to preserve the details described by the local curvature. We propose a descriptor that comprises the direction and magnitude of curvature and naturally expands classical orientation histograms like SIFT and HOG. We demonstrate the general benefit of the expansion exemplarily for image classification, object detection, and descriptor matching.

Supported by a scholarship of the Deutsche Telekom Stiftung.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Visual Features—From Early Concepts to Modern Computer Vision

Detection and Description of Image Features: An Introduction

Complementary Keypoint Descriptors

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Orientation histograms, such as SIFT [14] or HOG [4], are omnipresent in computer vision. They are the dominant descriptors in all recognition tasks and play a major role in structure from motion and image retrieval. This success is because the descriptor is invariant to local deformations on the one hand, and it is still descriptive due to the spatial grid of multiple histograms (cells) on the other hand. It allows recognition approaches to deal with a large set of natural variations without the need to model them explicitly.

However, the invariance to local deformations also frustrates discrimination of structures. HOG and SIFT locally simplify the image to straight lines. Therefore, the pointed tip of a cat’s ear leads to almost the same representation as the roundish ear of a dog (see Fig. 1).

In this paper, we extend the idea of orientation histograms to curvature, i.e., the image is no longer locally simplified to straight lines but curved lines. Consequently, curvature histograms can distinguish local shape at a more detailed level. In contrast to previous work [16], we compute curvature with a per-pixel filter instead of operating on parametric curve segments. Hence, the approach is generally applicable and the descriptor can be computed basically as fast as a SIFT or HOG descriptor.

Moreover, we include the sign of the curvature (convex vs. concave). It is worth noting that the sign of the curvature is different from the sign of the gradient, which often is of little use in classification tasks. In contrast, the sign of the curvature allows the descriptor to get an idea where the interior or exterior of a (convex) object is, independent of the gradient direction; see Fig. 2.

The richer set of features leads to significant improvements in descriptor performance. We demonstrate this with image classification experiments on Caltech 101, detection experiments on Pascal VOC 2007, and descriptor matching experiments on the dataset of Mikolajczyk et al. [15].

2 Related Work

The most closely related work is by Monroy et al. [16], where curvature histograms are used for object detection. In [6] Eigenstetter et al. use these features in combination with self-similarity. Our approach is different in the way how curvature is defined and computed. In [6, 16], the chord-to-point distance accumulation from [10] is used to measure curvature; see Fig. 3. This requires boundary segments, which Monroy et al. obtain performing a boundary probability computation pB [1]. Due to sparsity of such edge maps, some features in the image are lost for the descriptor. In contrast, we compute curvature densely based on the image gradient, which is significantly faster and preserves all relevant features. While Monroy et al. showed good results of their approach on images with strong line segments, our approach is more generally applicable, as we demonstrate in Sect. 4.

There are many works on curvature computation in general. In the continuous setting, curvature is well defined [17], yet on discrete images there are multiple ways to estimate it. Most works on discrete curvature estimation operate on binary images or discrete contours [3, 11, 13]. In contrast, the work in [5] also applies to color images. It assumes a height field surface given by the image intensity, and calculates the principal curvatures on this surface. However, this is fundamentally different from the intuitive understanding of curvature in an image, which is the curvature of isolines. Another classical application of curvature is the use of maxima of the norm of the Hessian to detect interest points. However, in this paper we compute the signed curvature densely on natural color images and curvature is represented by a curvature histogram to increase the expressiveness as a local image descriptor.

In a wider sense, also Carreira et al. [2] is related to our work. They suggested second order pooling of features in the context of semantic segmentation. This work shares the idea that important second order features need to be computed before pooling, as they get lost otherwise. While they propose the use of the feature vector covariance, we propose the use of second order derivatives. The two concepts are complementary.

3 Curvature

In general, the curvature of some curve is understood as how much it deviates from a straight line. Quantitatively, for one point on the curve it is defined as $\kappa = 1/r$, where $r$ is the radius of a circle fitting the curve locally; hence straight lines have zero curvature.

As there are different ways to describe a curve, there are also different ways to define curvature, all leading to the same concept as stated above. We focus on the definition from [17] and assume that the curve $X$ is parameterized by the arc-length $t$, such that equally spaced $t$ map to equally spaced $X(t)$. Let $T(t)$ be the tangent and $N(t)$ the normal at $X(t)$. As in [17], we define curvature as the change of the normal along the tangent:

$$\begin{aligned} \kappa \cdot T = -\frac{d N}{dt} \end{aligned}$$

(1)

For an intuition, imagine how the normal rotates when you walk along a curve. The quicker the change, the higher the curvature. The sign of $\kappa $ depends on the direction of the normal rotation.

3.1 Dense Vector Curvature on Images

When computing curvature in an image, we are limited by its resolution, and hence the underlying curve can never be reproduced accurately. Curvature computed on discrete images can only be an approximation of $\kappa $. In the discrete case, the image gradient must be approximated by considering a neighborhood with some spatial extent, for instance, by using finite differences. To distinguish a slightly curved line from a straight line, the observed area must be large enough. Hence, the maximum and minimum curvature that can be detected, are bounded by pixel size and neighborhood size.

Another important issue is illustrated in Fig. 4. While the divergence of the normalized gradient is a standard method to determine the curvature of the isolines, in the discrete case it can yield undesired curvature responses along straight lines.

Following the idea of Eq. 1, the aim is to determine the change of the gradient (or normal) along the tangent. Locally, the tangent coincides with the curve itself. At each point $(x,y)$ we extract the gradient $g = (g_x,g_y)^T = \nabla I$ using standard linear differencing filters. This yields also the tangent $\phi = (-g_y,g_x)^T$ orthogonal to the gradient. Further, let $g^N = g/|g|$ and $\phi ^N = \phi /|\phi |$ denote their normalized forms.

The Jacobian matrix of the normalized gradient (i.e. its differential) describes the change of gradient direction:

$$\begin{aligned}J(g^N)&=\left( \begin{matrix} \left( g_x^N\right) _x &{} \left( g_x^N\right) _y \\ \left( g_y^N\right) _x &{} \left( g_y^N\right) _y\end{matrix}\right) \end{aligned}$$

Thus, $J(g^N) \cdot \phi ^N \in \mathbb {R}^2$ gives the gradient change in tangent direction. According to (1), the resulting vector is expected to be $-\kappa \cdot T$ and hence parallel to the tangent. On a discrete image, however, small deviations occur. Thus we project this vector to the tangent to obtain the scalar $q$ approximating $\kappa $:

$$\begin{aligned} q = -\left( {\left( \phi ^N\right) }^\top \cdot J(g^N) \cdot \phi ^N\right) \in \mathbb {R} \end{aligned}$$

(2)

Note that this scalar curvature can be positive or negative. The sign depends on both the convexity and the sign of the gradient.

For deriving a descriptor, we suggest combining the curvature with the orientation of the gradient. In this scope, we are interested in decoupling the sign of the curvature from the sign of the gradient to distinguish convex and concave curves (see Fig. 2). To this end, we multiply $q$ with the gradient vector $g^N$

$$\begin{aligned} Q = q \cdot g^N \in \mathbb {R}^2 \end{aligned}$$

(3)

and obtain a vector $Q$, which we call vector curvature.

Intuitively, driving along the curved isoline with a car, the vector $Q$ points in the direction of the centrifugal force that acts on the passenger. Its magnitude is proportional to this force, assuming a constant velocity.

3.2 Feature Binning

With the above procedure, we compute the vector curvature $Q$ densely for every pixel in an image and bin it into sparse per-pixel histograms as illustrated in Fig. 5. We quantize the orientations of $Q$ into 8 bins. This histogram is partially redundant to the orientation histogram, but the sign depends on the direction of the curvature rather than the direction of the gradient. Our experiments show that the combination of orientation, sign and magnitude of curvature makes a strong feature.

Following the standard procedures to assemble SIFT and HOG descriptors [4, 8, 16], we pool these dense features into spatial histograms. The image is divided into a grid of square cells, and for each cell the sparse curvature histograms (one for each pixel) are aggregated by weighting them with their respective gradient magnitude. For detection, we follow the soft binning approach by linearly interpolating between cell centers just as in HOG. Figure 6 shows visualizations of trained HOG and curvature descriptors.

The feature computation on one image of size 300$\,\times \,$250px takes around 0.3 s with HOG. Adding our curvature, increases the computation time to 0.7 s. The approach from [16] takes about 35 s due to the costly extraction of boundary segments.

4 Experiments

To demonstrate the benefits of curvature histograms in a wide range of tasks and images, we tested it in three major areas: image classification, object detection, and descriptor matching.

Curvature is a second order feature and hence a natural expansion to first order features like HOG and SIFT. In isolation, second order features are not meaningful enough to exceed the performance of a good first order feature. Thus, we compared gradient-only features to the version that includes gradient and curvature histograms. A comparison to the boundary curvature from [16] was done in the same way.

4.1 Image Classification

For the experiments on image classification we used the Caltech 101 dataset and the VLfeat implementation of spatial pyramid matching [12, 18]. The baseline method is based on dense SIFT descriptors. It trains a k-means based bag-of-words classifier with an SVM. By collecting SIFT descriptors from random positions and scales, it builds a dictionary using k-means clustering. Using this dictionary, for every training image a spatial histogram is built with one bin per word (i.e. one k-means cluster). Spatial binning is achieved by dividing the image into $2\times 2$ and $4\times 4$ cells and generating histograms for each cell of each division. A $\chi ^2$ kernel map is applied before training the SVM on the histograms. For further details, we refer to the public source code.

We compared this baseline to our curvature-augmented approach which was added in the form of a second spatial pyramid (i.e. using a second dictionary). For training the SVM and during testing, both histograms were concatenated.

Results. For every category, we randomly chose 15 training images and 15 test images. We repeated the whole training and testing process for 10 random sets to report statistics on the accuracy in Fig. 8(a). The accuracy refers to the share of test images that were classified correctly, i.e., the sum of the confusion matrix diagonal. The average increase in accuracy of our method over the baseline is 2.6 %. While the curvature features from [16] work well on some images, they have problems on other images, where the extraction of good boundary segments is harder. On average, the accuracy is slightly lower than the baseline.

The proposed vector curvature improves performance on almost all classes. For some classes it is particularly useful: on the classes pyramid, flamingo head and electric guitar, the curvature extension outperforms baseline SIFT by a large margin. Objects belonging to these classes expose sharp corners, which help recognition. Figure 7 shows two sample images. As shown in Fig. 7(c), the flamingo exposes high curvature at the tip of its beak. Also the neck has a typical curvature. This information is largely ignored by orientation histograms.

4.2 Object Detection

For the evaluation in object detection, we trained filter masks for all 20 object classes used in the Pascal VOC 2007 challenge [7]. For every class, we clustered the training examples in 3 aspect ratios, such that in total 60 filter masks were trained. The training was iterated and used random negatives for the first round. Retraining was based on the hard-negatives. Testing was done in a multi-scale sliding window fashion.

As in image classification, we compared gradient-only features (HOG in this case) to the curvature augmented versions, which add either our vector curvature or the boundary curvature from [16]. For Pascal we also ran the whole training and testing 10 times with different random seeds for sampling negative examples.

Table 1. Pascal VOC 2007 Object Detection Challenge. Average results of 10 iterations of training and testing. While our approach improves the overall result, the approach from [16] on average decreases the performance over the baseline.

Full size table

Results. While a combination of HOG and our vector curvature increases the mean AP by 1.24 % over the baseline, the method from [16] combined with HOG decreases the mean AP by 1.65 %. To show statistical significance, the results are summarized using boxplots in Fig. 8(b). A detailed per-class overview can be found in Table 1. The proposed curvature extension of HOG increases the AP for 18 of 20 classes.

As demonstrated in [16], boundary curvature performs well on datasets with clean boundaries, such as the ETHZ shape dataset [9]. The proposed vector curvature descriptor is more generally applicable as it does not fully depend on strong boundaries. Moreover, it includes important cues provided by the curvature direction.

The raw numbers already indicate the benefit of adding vector curvature to the overall descriptor. Some more distinct advantages can be seen when looking at actual samples. Figure 9 shows false-positives of the bus category. The top row lists the top ranked false-positives detected by HOG that did not occur when using curvature. The bottom row analogously shows the top ranked false-positives detected with curvature that did not occur with HOG. Note how the false-positives obtained by adding curvature all show buses and are only due to suboptimal bounding boxes or duplicate detections. This indicates that by adding curvature we obtain an additional performance gain that is not measured by the standard evaluation criterion.

4.3 Descriptor Matching

In this experiment we evaluated the feature description performance in image matching tasks. We use the standard matching dataset by Mikolajczyk et al. [15], which contains different transformations such as zoom, rotation, blur and lighting changes. The dataset has previously been used for two different tasks: local interest point/region detection and local descriptor matching. We are interested in descriptor matching and intentionally disregard the detector performance by using the same region detector for both descriptors. We chose the MSER (maximally stable extremal regions) detector, because it is among the best detectors in [15]. The image patches of the detected elliptical regions were normalized to a uniform patch size and rotated to the dominant gradient orientation, which is the standard procedure. Given two images and their regions to be matched, we computed the descriptors of the normalized patches. All possible descriptor pairs were then ranked by their Euclidean distance. As in [15], a pair was considered to be a correct match if the overlap error of the region ellipses is less than 40 % when transformed by the ground truth mapping. Subsequently, for each image pair, we computed a precision-recall graph and its average precision (AP).

Results. The dataset contains 8 image categories with 5 image pairs each. The matching results for all 40 image pairs are given in Fig. 10, while the last graph shows the average performance over all categories. The curvature augmented version performs consistently better than SIFT alone.

5 Conclusions

We have presented an elegant expansion of the popular SIFT and HOG descriptors by curvature histograms. The new curvature descriptor can be computed as easily as SIFT and HOG. We have demonstrated the general applicability of the descriptor on very diverse tasks: image classification in a bag-of-features style, object detection with sliding windows, and descriptor matching. The expansion of SIFT or HOG descriptors by curvature consistently increases performance on all three tasks. This demonstrates that the benefits of curvature are not limited to certain datasets or object classes. The features are universal and can be used in various areas of visual recognition. Our source code will be made publicly available for research purposes.

References

Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. PAMI 33(5), 898–916 (2011)
Article Google Scholar
Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012)
Chapter Google Scholar
Coeurjolly, D., Miguet, S., Tougne, L.: Discrete curvature based on osculating circle estimation. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.) IWVF 2001. LNCS, vol. 2059, pp. 303–312. Springer, Heidelberg (2001)
Chapter Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)
Google Scholar
Deng, H., Zhang, W., Mortensen, E.N., Dietterich, T.G., Shapiro, L.G.: Principal curvature-based region detector for object recognition. In: CVPR (2007)
Google Scholar
Eigenstetter, A., Ommer, B.: Visual recognition using embedded feature selection for curvature self-similarity. In: NIPS, pp. 386–394 (2012)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results (2007). http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010)
Article Google Scholar
Ferrari, V., Tuytelaars, T., Van Gool, L.: Object detection by contour segment networks. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 14–28. Springer, Heidelberg (2006)
Chapter Google Scholar
Han, J.H., Poston, T.: Chord-to-point distance accumulation and planar curvature: a new approach to discrete curvature. Pattern Recogn. Lett. 22(10), 1133–1144 (2001)
Article MATH Google Scholar
Kerautret, B., Lachaud, J.-O.: Robust estimation of curvature along digital contours with global optimization. In: Coeurjolly, D., Sivignon, I., Tougne, L., Dupont, F. (eds.) DGCI 2008. LNCS, vol. 4992, pp. 334–345. Springer, Heidelberg (2008)
Chapter Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)
Google Scholar
Liu, H., Latecki, L.J., Liu, W., Bai, X.: Visual curvature. In: CVPR (2007)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Article Google Scholar
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.J.V.: A comparison of affine region detectors. IJCV 65(1–2), 43–72 (2005)
Article Google Scholar
Monroy, A., Eigenstetter, A., Ommer, B.: Beyond straight lines - object detection using curvature. In: ICIP, pp. 3561–3564 (2011)
Google Scholar
Sapiro, G.: Geometric Partial Differential Equations and Image Analysis. Cambridge University Press (2006). http://books.google.de/books?id=LFzGwdi0KtsC
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008). http://www.vlfeat.org/

Download references

Acknowledgments

The work was partially funded by the ERC Starting Grant VideoLearn.

Author information

Authors and Affiliations

Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
Philipp Fischer & Thomas Brox

Authors

Philipp Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brox
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philipp Fischer .

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Münster, Münster, Germany
Xiaoyi Jiang
Computer Science Department 5, University of Erlangen-Nürnberg, Erlangen, Germany
Joachim Hornegger
Department of Computer Science, University of Kiel, Kiel, Germany
Reinhard Koch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fischer, P., Brox, T. (2014). Image Descriptors Based on Curvature Histograms. In: Jiang, X., Hornegger, J., Koch, R. (eds) Pattern Recognition. GCPR 2014. Lecture Notes in Computer Science(), vol 8753. Springer, Cham. https://doi.org/10.1007/978-3-319-11752-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-11752-2_19
Published: 15 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11751-5
Online ISBN: 978-3-319-11752-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Image Descriptors Based on Curvature Histograms

Abstract

Similar content being viewed by others

Visual Features—From Early Concepts to Modern Computer Vision