Abstract
We present an approach to visual object-class segmentation and recognition based on a pipeline that combines multiple figure-ground hypotheses with large object spatial support, generated by bottom-up computational processes that do not exploit knowledge of specific categories, and sequential categorization based on continuous estimates of the spatial overlap between the image segment hypotheses and each putative class. We differ from existing approaches not only in our seemingly unreasonable assumption that good object-level segments can be obtained in a feed-forward fashion, but also in formulating recognition as a regression problem. Instead of focusing on a one-vs.-all winning margin that may not preserve the ordering of segment qualities inside the non-maximum (non-winning) set, our learning method produces a globally consistent ranking with close ties to segment quality, hence to the extent entire object or part hypotheses are likely to spatially overlap the ground truth. We demonstrate results beyond the current state of the art for image classification, object detection and semantic segmentation, in a number of challenging datasets including Caltech-101, ETHZ-Shape as well as PASCAL VOC 2009 and 2010.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Arbelaez, P., & Cohen, L. (2008). Constrained image segmentation from hierarchical boundaries. In Computer vision and pattern recognition, IEEE computer society conference on (pp. 1–8).
Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2009). From contours to regions: an empirical evaluation. In IEEE conference on computer vision and pattern recognition.
Bishop, C. M. (2007) Pattern recognition and machine learning Information science and statistics, 1st edn, 2006. Springer, Berlin corr. 2nd printing edn.
Blaschko, M. B., & Lampert, C. H. (2008). Learning to localize objects with structured output regression. In European conference on computer vision (pp. 2–15).
Bo, L., & Sminchisescu, C. (2009). Efficient match kernels between sets of features for visual recognition. In Advances in neural information processing systems.
Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In Computer vision and pattern recognition, IEEE conference on CVPR 2008 (pp. 1–8).
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In European conference on computer vision.
Borenstein, E., & Ullman, S. (2008). Combined top-down/bottom-up segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(12), 2109–2125.
Bosch, A., Zisserman, A., & Munoz, X. (2007). Representing shape with a spatial pyramid kernel. In CIVR’07.
Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n-d images. In ICCV (pp. 105–112).
Carreira, J., & Sminchisescu, C. (2010a). Constrained parametric min-cuts for automatic object segmentation, release 1. http://sminchisescu.ins.uni-bonn.de/code/cpmc/.
Carreira, J., & Sminchisescu, C. (2010b). Constrained parametric min cuts for automatic object segmentation. In IEEE conference on computer vision and pattern recognition.
Carreira, J., & Sminchisescu, C. (2012). Cpmc: Automatic object segmentation using constrained parametric min-cuts. IEEE Transaction on Pattern Analysis and Machine Intelligence (accepted).
Carreira, J., Ion, A., & Sminchisescu, C. (2010). Image segmentation by discounted cumulative ranking on maximal cliques (Tech. Rep.). 06-2010 (arXiv:1009.4823), Computer Vision and Machine Learning Group, Institute for Numerical Simulation, University of Bonn. Available at http://arxiv.org/abs/1009.4823.
Cour, T., & Shi, J. (2007). Recognizing objects by piecing together the segmentation puzzle. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Csurka, G., & Perronnin, F. (2008). A simple high performance approach to semantic segmentation. In BMVC.
Csurka, G., & Perronnin, F. (2010). An efficient approach to semantic segmentation. International Journal of Computer Vision 1–15.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.
Fei-Fei, L., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1), 59–70.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1627–1645.
Ferrari, V., Jurie, F., & Schmid, C. (2007). Accurate object detection with deformable shape models learnt from images. In IEEE conference on computer vision and pattern recognition.
Fulkerson, B., Vedaldi, A., & Soatto, S. (2009). Class segmentation and object localization with superpixel neighborhoods. In International conference on computer vision (pp. 670–677).
Gallo, G., Grigoriadis, M. D., & Tarjan, R. E. (1989). A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing, 18(1), 30–55. doi:10.1137/0218003.
Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In International conference on computer vision.
Gonfaus, J., Boix, X., de Weijer, J. V., Bagdanov, A., Serrat, J., & Gonzàlez, J. (2010). Harmony potentials for joint classification and segmentation. In IEEE conference on computer vision and pattern recognition.
Gould, S., Fulton, R., & Koller, D. (2009a). Decomposing a scene into geometric and semantically consistent regions. In International conference on computer vision.
Gould, S., Gao, T., & Koller, D. (2009b). Region-based segmentation and object detection. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams & A. Culotta (Eds.), Advances in neural information processing systems (pp. 655–663).
Grauman, K., & Darrell, T. (2005). The pyramid match kernel: discriminative classification with sets of image features. In International conference on computer vision (Vol. 2, pp. 1458–1465).
Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset (Tech. Rep. 7694). California Institute of Technology.
Gu, C., Lim, J. J., Arbeláez, P., & Malik, J. (2009). Recognition using regions. In IEEE conference on computer vision and pattern recognition.
He, X., Zemel, R. S., & Carreira-Perpiñán, M. (2004). Multiscale conditional random fields for image labeling. IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 695–702).
Ion, A., Carreira, J., & Sminchisescu, C. (2011). Image segmentation by figure-ground composition into maximal cliques. In International conference on computer vision.
Kohli, P., Ladicky, L., & Torr, P. (2008). Robust higher order potentials for enforcing label consistency. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Kumar, A., & Sminchisescu, C. (2007). Support kernel machines for object recognition. In International conference on computer vision.
Kumar, M. P., Torr, P. H. S., & Zisserman, A. (2005). Obj cut. In IEEE conference on computer vision and pattern recognition.
Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2009a). Associative hierarchical crfs for object class image segmentation. In International conference on computer vision.
Ladicky, L., Russell, C., Kohli, P., & Torr, P. H. S. (2009b). Associative hierarchical crfs for object class image segmentation. In International conference on computer vision.
Ladicky, L., Sturgess, P., Alaharia, K., Russel, C., & Torr, P. H. (2010). What, where & how many ? combining object detectors and crfs. In European conference on computer vision.
Lampert, C., Blaschko, M., & Hofmann, T. (2008). Beyond sliding windows: object localization by efficient subwindow search. In Computer vision and pattern recognition. IEEE conference on CVPR 2008 (pp. 1–8).
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 2169–2178).
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1–3), 259–289.
Levin, A., & Weiss, Y. (2009). Learning to combine bottom-up and top-down segmentation. International Journal of Computer Vision, 81(1), 105–118.
Li, F., Carreira, J., & Sminchisescu, C. (2010a). Object recognition as ranking holistic figure-ground hypotheses. In IEEE conference on computer vision and pattern recognition.
Li, F., Ionescu, C., & Sminchisescu, C. (2010b). Random Fourier approximations for skewed multiplicative histogram kernels. In Annual symposium of the German association for pattern recognition (DAGM).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Maire, M., Arbelaez, P., Fowlkes, C., & Malik, J. (2008). Using contours to detect and localize junctions in natural images. In IEEE conference on computer vision and pattern recognition.
Malisiewicz, T., & Efros, A. (2007). Improving spatial support for objects via multiple segmentations. In British machine vision conference.
Malisiewicz, T., & Efros, A. A. (2008). Recognition by association via learning per-exemplar distances. In IEEE conference on computer vision and pattern recognition.
Mori, G., Ren, X., Efros, A., & Malik, J. (2004). Recovering human body configurations: combining segmentation and recognition. In Computer vision and pattern recognition. Proceedings of the 2004 IEEE computer society conference on CVPR 2004 (Vol. 2, pp. II-326–II-333).
Pantofaru, C., Schmid, C., & Hebert, M. (2008). Object recognition by integrating multiple image segmentations. In European conference on computer vision.
Pinto, N., Cox, D. D., & DiCarlo, J. J. (2008). Why is real-world visual object recognition hard? PLoS Computational Biology 4(1), e27.
Rabinovich, A., Belongie, S., Lange, T., & Buhmann, J. M. (2006). Model order selection and cue combination for image segmentation. In IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 1130–1137).
Rabinovich, A., Vedaldi, A., & Belongie, S. (2007). Does image segmentation improve object categorization? (Tech. Rep.). CS2007-090.
Rahimi, A., & Recht, B. (2007). Random features for large-scale kernel machines. In Advances in neural information processing systems.
Schoenemann, T., & Cremers, D. (2010). A combinatorial solution for model-based image segmentation and real-time tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1153–1164.
Shi, J., & Malik, J. (2000) Normalized cuts and image segmentation. IEEE Transaction on Pattern Analysis and Machine Intelligence. doi:10.1109/34.868688.
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In European conference on computer vision (pp. 1–15).
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2009). Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81, 2–23.
Srinivasan, P., & Shi, J. (2007). Botom-up recognition and parsing of the human body. In IEEE conference on computer vision and pattern recognition.
Todorovic, S., & Ahuja, N. (2008). Learning subcategory relevances for category recognition. In IEEE conference on computer vision and pattern recognition.
Toshev, A., Taskar, B., & Daniilidis, K. (2010). Object detection via boundary structure segmentation. In IEEE conference on computer vision and pattern recognition (pp. 950–957).
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In Proceedings of the international conference of machine learning.
Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley: Reading.
van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 9, 1582–1596.
Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In IEEE conference on computer vision and pattern recognition.
Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In International conference on computer vision.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE conference on computer vision and pattern recognition.
Yang, Y., Hallman, S., Ramanan, D., & Fowlkes, C. (2010). Layered object detection for multi-class segmentation. In IEEE conference on computer vision and pattern recognition.
Yu, H. F., Hsieh, C. J., Chang, K. W., & Lin, C. J. (2010). Large linear classification when data cannot fit in memory. In ACM SIGKDD conference on knowledge discovery and data mining.
Yu, S. X., & Shi, J. (2003). Object-specific figure-ground segregation. In IEEE conference on computer vision and pattern recognition (Vol. 2, p. 39).
Zhang, H., Berg, A., Maire, M., & Malik, J. (2006). Svm-knn: discriminative nearest neighbor classification for visual category recognition. In Computer vision and pattern recognition. IEEE computer society conference on (Vol. 2, pp. 2126–2136).
Author information
Authors and Affiliations
Corresponding author
Additional information
The first two authors contributed equally.
Rights and permissions
About this article
Cite this article
Carreira, J., Li, F. & Sminchisescu, C. Object Recognition by Sequential Figure-Ground Ranking. Int J Comput Vis 98, 243–262 (2012). https://doi.org/10.1007/s11263-011-0507-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0507-2