Abstract
We present a method to automatically discover meaningful features in unlabeled image collections. Each image is decomposed into semi-local features that describe neighborhood appearance and geometry. The goal is to determine for each image which of these parts are most relevant, given the image content in the remainder of the collection. Our method first computes an initial image-level grouping based on feature correspondences, and then iteratively refines cluster assignments based on the evolving intra-cluster pattern of local matches. As a result, the significance attributed to each feature influences an image’s cluster membership, while related images in a cluster affect the estimated significance of their features. We show that this mutual reinforcement of object-level and feature-level similarity improves unsupervised image clustering, and apply the technique to automatically discover categories and foreground regions in images from benchmark datasets.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agarwal, A., & Triggs, B. (2006). Hyperfeatures multilevel local coding for visual recognition. In European conference on computer vision.
Chum, O., & Zisserman, A. (2007). An exemplar model for learning object classes. In Conference on computer vision and pattern recognition.
Dhillon, I., Guan, Y., & Kulis, B. (2004). Kernel k-means: spectral clustering and normalized cuts. In ACM SIGKDD international conference on knowledge discovery and data mining.
Dorko, G., & Schmid, C. (2003). Selection of scale-invariant parts for object class recognition. In International conference on computer vision.
Dueck, D., & Frey, B. (2007). Non-metric affinity propagation for unsupervised image categorization. In International conference on computer vision.
Dy, J., & Brodley, C. (2004). Feature selection for unsupervised learning. Journal of Machine Learning Research, 5, 845–889.
Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The PASCal visual object classes challenge 2006 (VOC2006) Results.
Fei-Fei, L., & Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Conference on computer vision and pattern recognition.
Fei-Fei, L., Fergus, R., & Perona, P. (2004). Caltech 101 image database.
Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google’s image search. In International conference on computer vision.
Grauman, K., & Darrell, T. (2004). Fast contour matching using approximate Earth mover’s distance. In Conference on computer vision and pattern recognition.
Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In International conference on computer vision.
Grauman, K., & Darrell, T. (2006). Unsupervised learning of categories from sets of partially matching image features. In Conference on computer vision and pattern recognition.
Griffin, G., Holub, A., & Perona, P. (2007). Caltech 256 image database.
Lazebnik, S., Schmid, C., & Ponce, J. (2003). A sparse texture representation using affine-invariant regions. In Conference on computer vision and pattern recognition.
Lazebnik, S., Schmid, C., & Ponce, J. (2004). Semi-local affine parts for object recognition. In British machine vision conference.
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Conference on computer vision and pattern recognition.
Lee, Y. J., & Grauman, K. (2008a). Foreground focus: Finding meaningful features in unlabeled images. In British machine vision conference.
Lee, Y. J., & Grauman, K. (2008b). Discovering multi-aspect structure to learn from loosely labeled image collections. Technical report, UT-Austin, May 2008b.
Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Wkshp on statistical learning in computer vision.
Ling, H., & Soatto, S. (2007). Proximity distribution kernel for geometric context in recognition. In International conference on computer vision.
Liu, D., & Chen, T. (2007). Unsupervised image categorization and object localization using topic models and correspondences between images. In International conference on computer vision.
Liu, D., & Chen, T. (2006). Semantic-shift for unsupervised object detection. In CVPR Wkshop on Beyond Patches.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2).
Marszalek, M., & Schmid, C. (2006). Spatial weighting for bag-of-features. In Conference on computer vision and pattern recognition.
Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. International Journal of Computer Vision, 1(60), 63–86.
Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In European conference on computer vision.
Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2006). Generic object recognition with boosting. Transacations on Pattern Analysis and Machine Intelligence 28(3).
Quack, T., Ferrari, V., Leibe, B., & Gool, L. V. (2007). Efficient mining of frequent and distinctive feature configurations. In International conference on computer vision.
Quelhas, P., Monay, F., Odobez, J.-M., Gatica-Perez, D., Tuytelaars, T., & Van Gool, L. (2005). Modeling scenes with local descriptors and latent aspects. In International conference on computer vision, Beijing, China, October 2005.
Rubner, Y., Tomasi, C., & Guibas, L. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.
Russell, B., Efros, A., Sivic, J., Freeman, W., & Zisserman, A. (2006). Using multiple segmentations to discover objects and their extent in image collections. In Conference on computer vision and pattern recognition.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. Transacations on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Sivic, J., & Zisserman, A. (2004). Video data mining using configurations of viewpoint ivariant regions. In Conference on computer vision and pattern recognition.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering object categories in image collections. In International conference on computer vision.
Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In European conference on computer vision.
Winn, J., & Jojic, N. (2005). LOCUS: Learning object classes with unsupervised segmentation. In International conference on computer vision.
Winn, J., Criminisi, A., & Minka, T. (2005). Object categorization by learned universal visual dictionary. In International conference on computer vision.
Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In Advances in neural information processing (NIPS), Vancouver, Canada, December 2004.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, Y.J., Grauman, K. Foreground Focus: Unsupervised Learning from Partially Matching Images. Int J Comput Vis 85, 143–166 (2009). https://doi.org/10.1007/s11263-009-0252-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-009-0252-y