A Taxonomy of Part and Attribute Discovery Techniques

Maji, Subhransu

doi:10.1007/978-3-319-50077-5_10

Subhransu Maji⁵

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1428 Accesses
1 Citations

Abstract

This chapter surveys recent techniques for discovering a set of Parts and Attributes (PnAs) in order to enable fine-grained visual discrimination between its instances. Part and Attribute (PnA)-based representations are popular in computer vision as they allow modeling of appearance in a compositional manner, and provide a basis for communication between a human and a machine for various interactive applications. Based on two main properties of these techniques a unified taxonomy of PnA discovery methods is presented. The first distinction between the techniques is whether the PnAs are semantically aligned, i.e., if they are human interpretable or not. In order to achieve the semantic alignment these techniques rely on additional supervision in the form of annotations. Techniques within this category can be further categorized based on if the annotations are language-based, such as nameable labels, or if they are language-free, such as relative similarity comparisons. After a brief introduction motivating the need for PnA based representations, the bulk of the chapter will be dedicated to techniques for PnA discovery categorized into non-semantic, semantic language-based, and semantic language-free methods. Throughout the chapter we will illustrate the trade-offs among various approaches though examples from the existing literature.

Access provided by CONRICYT-eBooks. Download chapter PDF

The Open Images Dataset V4

Article 13 March 2020

PartImageNet: A Large, High-Quality Dataset of Parts

Parsing Objects at a Finer Granularity: A Survey

Article 12 January 2024

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

This surveys a number of part-based and attribute-based models proposed in the last decade in the context of visual recognition, learning, and description for human-computer interaction. Part-based representations have been very successful for various recognition tasks ranging from detecting objects in cluttered scenes [9, 34], segmenting objects [16, 107], recognizing scene categories [45, 72, 77, 92], to recognizing fine-grained attributes of objects [10, 98, 111]. Parts provide robustness to occlusion—the head of a person can be detected even when the legs are occluded. Parts can also be composed in different ways enabling generalization to novel viewpoints, poses, and articulations of objects. Two popular methods, namely the Deformable Part-based Model (DPM) of Felzenszwalb et al. [34] and the poselets of Bourdev et al. [9, 11], exploit this property to build robust object detectors.

The compositional nature of part-based models is also the basis for Convolutional Neural Networks (CNNs). While traditional part-based models can be seen as shallow networks where the representations are hand-designed, CNNs learn all the model parameters from raw pixels to image labels in an end-to-end manner using a deeper architecture. When trained on large labeled datasets, deep CNNs have led to breakthrough results on a number of recognition tasks [44, 48, 87], and are currently the dominant approach for nearly all visual recognition problems.

Beyond recognition, a set of parts provides a means for a human to indicate the pose and articulation of an object. This is useful for recognition with humans “in the loop” where a person can annotate a part of the object to guide recognition. For instance, Branson et al. [13] interactively categorize birds by asking users to click on discriminative parts leading to significant improvement over the computer vision only baseline. In such cases it is desirable that the parts represent semantically aligned concepts since it involves communication with a human.

Along with parts, visual attributes provide a means to model the appearance of objects. The word “attribute” is extremely generic as it can refer to any property that might be associated with an object. Attributes can describe an entire object or a part, e.g., a tall person or a long nose. Attributes can refer to low-level properties such as color and texture, or high-level properties such as age and gender of a person. Attributes can be shared across categories, e.g., both a dog and a cat can be “furry”, allowing the description of previously unseen categories. Semantically aligned attributes provide a basis for learning interpretable visual classifiers [33], create classifiers for unseen categories [52], debugging recognition systems through attribute-based explanations [3, 76], and providing human feedback during learning and inference [14, 46, 51, 78].

Thus, PnAs provide a rich compositional way of describing and recognizing categories. Techniques for PnA discovery are necessary as the desired set of parts and attributes often depend on the underlying task. While it may not be necessary to model the gender, hair-style, or the eye color of a person for detecting them, it may be useful for identifying a particular individual. One motivating reason for the unified treatment of PnAs in this chapter is that their roles are interchangeable for recognition and description. For instance, in order to distinguish between a red-beaked and a yellow-beaked bird, one could have two parts, “red beak” and “yellow beak” and no attributes, or a single part “beak” with two attributes, red and yellow. Therefore, from a representation point of view it is more fruitful to think of the joint space induced by various part-attribute interactions instead of each one of them independently. In other words we can think of attributes being localized, i.e. associated with a part, or not.

The next section provides an overview of the rest of the chapter, and describes a unified taxonomy of recent PnA discovery methods.

1.1 Overview

Although there are many ways to categorize the vast number of methods for PnA discovery in the literature, the particular one described in this chapter was chosen because it is especially useful for fine-grained domains which are our main focus. Often these domains have a rich structure described through language, visual illustrations, and other modalities, which can be used to guide representation learning. Translating all this information to useful visual properties is one of the main challenges of these methods. The proposed taxonomy categorizes various PnA methods based on

the degree to which the models explicitly try to achieve semantic alignment or interpretability of the underlying PnAs,
the nature of the source of semantics, i.e. if they are language-based or not.

When semantic alignment is not the primary goal, the PnAs can be thought of as a intermediate representation of the appearance of objects. Example methods for part discovery in this setting include DPMs [34], and CNNs [48, 56]. Here the learned parts factorize the appearance variation within the category and are learned without additional supervision apart from the category labels at the object or image level. Hence, semantic alignment is not guaranteed and parts that arise tend to represent visually salient patterns. Similarly non-semantic attributes can be thought of as the coordinates in a transformed space of images optimized for the recognition task. Such methods are described in Sects. 10.2.1 and 10.2.2.

Language is a natural source of semantics. Although the vocabulary of parts and attributes that arise in language are a result of multiple phenomena, they provide a rich source of interpretable visual PnAs. For instance, parts of animals can be based on the names of anatomical parts. Various existing datasets that contain part annotations follow this strategy. This include the Caltech-UCSD Birds (CUB) dataset [100], OID:Airplanes dataset [98], and part annotations of animals in PASCAL VOC dataset [9, 20]. Similarly, attributes can be based on common color, texture, and shape terms used in language, or can be highly specialized language-based properties of the category. For example, the CUB dataset annotates parts of birds with color attributes, while the Berkeley “attributes of people” dataset [10] contains attributes describing gender, clothing, age, etc. We review techniques for collecting language-based attribute and part annotations in Sects. 10.3.1 and 10.3.4 respectively.

Task-specific language-based PnAs can also be discovered by analyzing descriptions of objects (Sect. 10.3.2). For example, Berg et al. [6] analyze captioned images on the web to discover attributes. Nameable attributes may also be discovered interactively by asking annotators to name the principal directions of variations within the data [79], by selecting a subset of attributes that frequently discriminate instances [80], or by analyzing descriptions of differences between instances [63]. We review such techniques in Sect. 10.3.3.

Beyond language, semantic alignment of PnAs may also be achieved by collecting language-free annotations (Sect. 10.4). An example of this is through similarity comparisons of the form “is A more similar to B than C”. The coordinates of the embedded space that reflects these similarity comparisons can be viewed as an semantic attribute [101] (Sect. 10.4.1). Another example is when an annotator clicks on landmarks between pairs of instances. Such data can be collected without having to name the parts providing a way to annotate parts for categories that do not have a well defined set of nameable parts [65]. The resulting pairwise correspondence data can be used for learning semantic part appearance models (Sect. 10.4.2).

Figure 10.1 shows the taxonomy pictorially. Existing approaches are divided into three main categories: non-semantic PnAs (Sect. 10.2), semantic language-based PnAs (Sect. 10.3), and semantic language-free PnAs (Sect. 10.4). Within each category we further organize approaches into various sections to illustrate the scenarios when they are applicable and the computational versus annotation-cost trade-offs they offer. We describe some open questions and conclude in Sect. 10.5.

2 Non-semantic PnAs

A common theme underlying techniques for non-semantic PnA discovery is that the parts and attributes arise out of a framework where the goal is a factorized representation of the appearance space. Pictorially, one can think of PnAs as an intermediate representation between the images and high-level semantics. The factorization results in better computational efficiency, statistical efficiency, and robustness of the overall model.

2.1 Attributes as Embeddings

A typical strategy of learning attributes in this setting is to constrain the intermediate representation to be low-dimensional or sparse. Techniques for dimensionality reduction, such as k-means [59], Principal Component Analysis (PCA) [42], Locality Sensitive Hashing [37], auto-encoders [4], and spectral clustering [68], can be applied to obtain compact embeddings.

An early application of such approach for recognition is the eigenfaces of Turk and Pentland [97]. PCA is applied to a large number of aligned frontal faces to learn a low-dimensional space corresponding to the first few PCA basis. These capture the major axes of variations, some of which are aligned to factors such as lighting, or facial expression. The low-dimensional embedding was used for face recognition in their setting. One can use an image representation such as Fisher Vector [81, 82] instead of pixel values before dimensionality reduction for additional invariance. These techniques have no explicit control over the semantic alignment of the representation, and are not guaranteed to lead to interpretable attributes.

In a task-specific setting the intermediate representation can be optimized for the final performance. An example of this is a two-layer neural network for image classification that takes raw pixels as input and produces class probabilities via an intermediate layer which can be seen as attributes.

There are many realizations of this strategy in the literature that vary in the specifics of the architecture and the nature of the task. For example, the “picodes” approach of Bergamo et al. [7] learns a compact binary descriptor (e.g., 16 bytes) that has a good object recognition performance. Attributes are parametrized as \(a(\mathbf {x}) = \mathbf {1}[\mathbf {w}^T\mathbf {x} > 0]\), for some weight vector \(\mathbf {w}\) for an input representation \(\mathbf {x}\). Rastegari et al. [86] use a similar parameterization but use a notion of “predictability” measured as attributes that achieve high separation between classes as the objective. Yu et al. [109] learn attributes by formulating it as a matrix factorization problem.

Experiments reported in the above work show that the task-driven attributes achieve better performance compared to unsupervised methods for attribute discovery on datasets such as Caltech-256 [40] and ImageNet [28]. Moreover, they provide a compact representation of images for efficient retrieval and other applications.

2.2 Part Discovery Based on Appearance and Geometry

In addition to appearance, part-based models can take into account the geometric relationships between the parts during learning. In the unsupervised, or task-free setting, parts may be obtained by clustering local patches using any unsupervised method such as k-means, spectral clustering, etc. This is the one of the key steps in the bag-of-visual-words representation of images [24] and their variants such as the Fisher Vector [81, 82] and Vector of Locally Aggregated Descriptors (VLAD) [43], which are some of the early successful image representations.

Geometric information can be added during the clustering process to account for spatial consistency, e.g., by coarsely quantizing the space using a spatial pyramid [55], or by appending the coordinates of the local patches (called “spatial augmentation”) to the appearance before clustering [90, 91]. Parts may also be discovered via correspondences between pairs of instances obtained by some low-level matching procedure. For instance, Berg et al. [5] discover important regions in images by considering geometrically consistent feature matches across instances.

Another example of a model that combines appearance and geometry for part learning is the DPM of Felzenszwalb et al. [34]. The model has been widely used for object detection in cluttered scenes. A category is modeled as a mixture of components, each of which is represented as a “root” template and a collection of “parts” that can move independently relative to the root template. The tree-like structure of the model allows efficient inference through distance transforms. The parameters of the model are learned through an iterative procedure where the component membership, part positions, and appearances models are updated in order to obtain good separation between positive examples and the background. Figure 10.2a shows two components learned for person detection on the PASCAL VOC dataset [32]. The compositional architecture of the DPM led to significant improvements over the monolithic template-based detector of Dalal and Triggs [25].

Another example for task-driven part discovery is the “discriminative patches” approach of Singh et al. [92]. Here parts are initialized by clustering appearance, and through a process of positive and hard-negative mining the part appearances are iteratively refined. Finally parts that are frequent and help discriminate among classes are selected. Figure 10.2b shows example discriminative patches discovered for the PASCAL VOC dataset. The authors demonstrate good performance on image classification datasets, such as PASCAL VOC, MIT Indoor scenes [83], using a representation that records the activation of discriminative patches at different locations and scales (similar to a bag-of-visual-words model [24]).

Since these methods primarily rely on appearance and geometric consistency, the discovered parts may not be aligned to semantics. For instance, the DPM requires that each object have the same set of parts even if the object is partially occluded. Hence the model uses a part to both recognize a part of the object or its occluder. Similarly, discriminative patches are visually consistent parts according to the underlying Histograms of Oriented Gradient (HOG) features [25] and hence two patches that are visually dissimilar but belong to the same semantic category are unlikely to be grouped as the same part. For example, two kinds of car wheels, or two styles of windows, will be represented using two or more parts.

Convolutional Neural Networks (CNNs) be seen as part-based model trained in an end-to-end manner, i.e. starting from a pixel representation to class labels. The hierarchy of convolution and max-pooling layers resemble the computation of a deformable part-based model. Indeed, the DPM can be seen as a particular instantiation of a CNN since both HOG (see Mahendran and Vedaldi [62]) and the DPM computations (see Girshick et al. [38]) can be written as shallow CNNs. However, after the recent breakthrough result of Krishevsky et al. [48] on the ImageNet classification dataset [28], CNNs have become the architecture of choice for nearly all visual recognition tasks [12, 23, 39, 44, 60, 87, 94, 111, 112].

CNNs trained in a supervised manner can be seen to simultaneously learn parts and attributes. For instance, visualizations of the “AlexNet CNN” [48] by Zeiler and Fergus [110], as seen in Fig. 10.3, reveal units that activate strongly on parts such as human and dog faces, as well as attributes such as “text” and “grid-like”. Recent works, such as the bilinear CNNs [57] show that discriminative localized attributes emerge when these models are fine-tuned for fine-grained recognition tasks. Figure 10.4 shows example filters learned when these mdoels are trained on birds [100], cars [47], and airplane [64] datasets. The remarkable performance of CNNs shows that considering part and attribute discovery jointly can have significant benefits.

3 Semantic Language-Based PnAs

Language is the source of categories for virtually all modern datasets in computer vision. The widely used ImageNet dataset reflects the hypernymy-hierarchy (“is a” relationships) of nouns in WordNet—a lexical database of words in English organized in a variety of ways [67]. Naturally, language is also a source of PnAs useful for a high-level description of objects, scenes, materials, and other visual phenomenon. For example, a cat can be described as a four-legged furry animal. This human-interpretable description of learned models provides a means for communication between a human and machine during learning and inference. Below we overview several applications of language-based PnAs from the literature.

3.1 Expert Defined Attributes

An early example of language-based attributes in the computer vision community was for describing texture. Bajscy proposed attributes such as orientation, contrast, size, and spacing of structural elements in periodic textures [2]. Tamura et al. [95] identified six visual attributes of textures namely coarseness, contrast, directionality, linelikeness, regularity, and roughness. Amadasun and King derived computational models for five properties of texture, namely, coarseness, contrast, business, complexity, and texture strength [1].

Recently, Cimpoi et al. [22] extended the set of describable attributes to include 47 different words based on the work of Rao and Lohse [85]. Other texture attributes such as material properties have been used to construct datasets such as CUReT [26], UIUC [54], UMD [105], Outex [69], Drexel Texture Database [71], KTH-TIPS [17, 41] and Flickr Material Dataset (FMD) [89]. In all the above cases experts identified the set of language terms as attributes based on domain knowledge, or in some cases through human studies [85].

Beyond textures, language-based attributes have since been proposed for a variety of other datasets and applications. Farhadi et al. [33] describe object categories with shape, part-names and material attributes. Lampert et al. [52] proposed the Animals with Attributes (AwA) dataset consisting of variety of animals with shared attributes such as color, food habits, size, etc. The Caltech-UCSD Birds (CUB) dataset [100] consists of hundreds of species of birds labeled with attributes such as the shape the beak, color of the wings, etc. The OID:Airplanes [98] dataset consists of airplanes labeled with attributes such as number of wings, type of wheels, shapes of parts, etc. Attributes such as gender, eye color, hair syle, etc., have been used by Kumar et al. [49] to recognize, describe, and retrieve faces. Other examples include attributes of people [10], human actions [58], clothing style and fashion [19, 106], urban tribes [50], and asthetics [30].

A challenge is using language-based attributes to the degree of specialization to be considered. For instance, while an attribute of airplane such as the shape of the nose can be understood by most people, an attribute such as the type of the aluminum alloy used in manufacturing can only be understood by a domain expert. Similarly, the scientific names of parts of animals are typically known only to a domain expert. While common attributes have the advantage that they can be annotated by “crowdsourcing”, they may lack the precision needed for fine-grained discrimination between closely related categories. Bridging the gap between expert-defined and commonly used attributes remains an open question. In the context of object categories this aspect has been studied by Ordonez et al. [70] where they learn common names (“entry-level categories”) by analyzing the frequency of usage in text on the Internet, e.g. grampus griseus is translated to a dolphin.

3.2 Attribute Discovery by Automatically Mining Text

Language-based attributes may also be mined from large sets of images with captions. Ferrari and Zisserman [36] mine attributes of texture and color from descriptions on the web. Berg et al. [6] obtain attributes by mining frequently occurring phrases from captioned images and estimating if they are visually salient by training a classifier to predict the attribute from images (Fig. 10.5a). In the process they also characterize if the attributes are localized or not. Text on the Internet from online books, Wikipedia articles, etc., have been mined to discover attributes for objects [31] (Fig. 10.5b), semantic affordances of objects and actions [18], and other common-sense properties of the visual world [21].

3.3 Interactive Discovery of Nameable Attributes

While captioned images are a great source of attributes, the vast majority of categories are not well represented in captioned images on the web. In such situations one can aim to discover nameable attributes interactively. Parikh and Grauman [73] show annotators images that vary along a projection of the underlying features and ask them to describe it if possible (Fig. 10.6a). To be effective the method requires a feature space whose projections are likely to be semantically correlated.

Patterson and Hays [80] start from a set of attributes mined from natural language descriptions and ask annotators to select five attributes that distinguish images from various scene classes in the SUN database. Thus attributes suited for discrimination within the set of images can be discovered (Fig. 10.6b).

A similar strategy was used in my earlier work [63] where annotators were asked to describe the visual differences between pairs of images (Fig. 10.6c) revealing fine-grained properties useful for discrimination. The collected data was mined to discover a lexicon of parts and attributes by analyzing the frequency and co-occurrence of words in the descriptions (Fig. 10.7).

3.4 Expert Defined Parts

Like attributes, language-based parts have been widely used in computer vision for modeling articulated objects. An early example of this is pictorial structure model for detecting people in images where parts were based on the human anatomy [35]. A modeling decision that is unique compared to attributes is the choice of the spatial extent, scale, pose, and other visual phenomenon, for a given semantic part.

Broadly, there are commonly used methods for collecting part annotations (Fig. 10.8). The first is landmark-based where positions of landmarks, such as joint positions of humans, or fiducial points for faces are annotated. The second is bounding-box-based where part bounding-boxes are explicitly labeled to define the extent of each part. The bounding-boxes may be further refined to reflect the pixel-wise support or segmentation of the parts.

When landmarks are provided one could simply assume that parts correspond to these landmarks. This strategy has been applied for modeling faces with fiducial points [113], articulated people with deformable part-based models [35, 108], etc. Another strategy is to discover parts that correspond to frequently occurring configuration of landmarks. The poselets approach combines this strategy with a procedure to select a set of diverse and discriminative parts for the task of person detection [9]. The discovered poselets are different from both landmarks and anatomical parts (Fig. 10.9a). For instance, a part consisting of half the profile face and the right shoulder is a valid poselet. These patterns can capture distinctive appearances that arise due to self-occlusion, foreshortening, and other phenomenon which are hard to model in a traditional part-based model.

When bounding-boxes are provided there is relatively little flexibility in part discovery. Much work in this setting has focused on effectively modeling appearance through a mixture of templates. Additional annotations, such as viewpoint, pose, or shape, can be used to guide mixture model learning. For instance, Vedaldi et al. [98] show that using shape and viewpoint annotations to initialize HOG-based parts improves detection accuracy compared to the aspect-ratio based clustering (Fig. 10.9b).

4 Semantic Language-Free PnAs

Language-based PnAs, when applicable, provide a rich semantic representation of objects. However language alone may not be sufficient to capture the full range of visual phenomena. Consider the space of colors defined by the [R, G, B] values. Berlin and Kay in their seminal work [8] analyzed the words used to describe color across widely across languages. While languages like English have many words to describe color, there are languages that have very few words, including an extreme case of language with only have two words (“bright” and “dull”) to describe color leading to a gross simplification of the color space. Similarly, restricting one to nameable parts poses challenges in annotating categories that are structurally diverse. It would require significant effort to define a set of parts that apply to all chairs, or all buildings, since the resulting set of parts would have to very large to account for the diversity within the category. Moreover, the parts are unlikely to have intuitive names, e.g. “top-right corner of the left handle”.

In this section we overview methods to discover semantically aligned PnA without restricting oneself to language-based interfaces. The underlying approach is to collect annotations relative to another. Such annotations provide constraints which can be utilized to guide the alignment of the representation to semantics. We describe several examples of such approaches.

4.1 Attribute Discovery from Similarity Comparisons

Similarity comparisons of the form “A is more similar to B than C”, can be used to obtain annotations without relying on language. These can be used to transform the data into an Euclidean space that respects the similarity constrains using methods for distance metric learning [27, 104], large-margin nearest neighbor learning [103], t-STE [61], Crowd Kernel Learning [96], etc.

Figure 10.10 shows a visualization of the categories in the CUB dataset using a two-dimensional embedding learned from crowdsourced similarity comparisons between images [101]. Each image-level similarity constraint is converted to a category-level similarity constraint by using the category labels of the images from which an embedding is learned using t-STE. A group of points on the bottom-right corresponds to perching birds, while another group on the bottom-left corresponds to gull-like birds.

Since a representation learned in such manner respects the underlying perceptual similarity, it can be used as a means of interacting with a user for fine-grained recognition. Wah et al. [101] build an interface where users interactively recognize bird species by selecting the most similar image in a display. The underlying perceptual embedding is used to select the images to be displayed in each iteration. The authors show that the method requires fewer questions to get to the right answer than an attribute-based interface of Branson et al. [14].

A drawback of similarity comparisons is that there can be considerable ambiguity in the task since there are many ways to compare images. Most methods for learning embeddings do not take this into account and hence are less robust to annotations collected via “crowdsouring” which can have significant noise. A number of approaches aim to reduce this ambiguity by providing additional instructions to the annotators.

The relative attributes approach of Parikh and Grauman [74] guides similarity comparisons by focusing on a particular describable attribute. An example annotation task is: is A smiling more than B, as seen in Fig. 10.11a. Such annotations are used to learn a ranking function, or a one dimensional embedding, of images corresponding to the attribute. Relative attributes bridge the gap between categorical attributes and low-dimensional semantic embeddings, and have been used for interactive search and learning of visual attributes [46, 75].

Wah et al. [101] guide similarity comparisons by restricting the image to a part of the object, as seen in Fig. 10.11b, to obtain a semantic embedding of parts. The authors use parts discovered using the discriminative patches approach [92], but part annotations can be used instead when available. The authors show that localized perceptual similarities provides a richer way of indicating closeness to a test image and leads to better efficiency during interactive recognition tasks.

4.2 Part Discovery from Correspondence Annotations

Traditional methods for annotating parts require a set of nameable parts. When such parts are not readily available one can instead label correspondences between pairs of instances. Maji and Shakhanrovich [65, 66] show that when annotators are asked to mark correspondences between image pairs within a category, the result is fairly consistent across annotators, even when the names of parts are not known (Fig. 10.12a). Annotators rely on semantics beyond visual similarity to mark correspondences—two windows are matched even though they are visually different.

Methods for part discovery that rely on appearance and geometry can be extended to take into account the pairwise constraints obtained from such correspondence annotations. The authors propose an approach were the patches corresponding to a semantic part are iteratively updated while respecting the underlying matches between image pairs. The resulting discovered patches are both visually and semantically aligned and can be used for rich part-based analysis of objects, including for detection and segmentation [66].

Another method that implicitly obtains correspondences is the BubbleBank approach of Deng et al. [29]. Annotators are shown two images A and B, and asked which of the two is the category of the third image (Fig. 10.12b). The caveat is that the third image is blurry, but the user can click on parts of the image to reveal what is underneath. Since, in order to accurately recognize the category corresponding parts have to be compared such annotations reveal the salient regions or parts for a given category. These clicks are used to create the BubbleBank representation, a set of parts centered around the frequently clicked locations, and applied for fine-grained recognition .

5 Conclusion

The chapter summarizes the current techniques for PnA discovery by categorizing them into three broad categories. The methods described are most relevant for describing and recognizing fine-grained categories, but this is by no means a complete account of existing methods. Unsupervised part-based methods alone have a rich history and even within the DPM family methods vary on how they model part appearance and geometric relationships between parts. See Ramanan [84] for a excellent survey of classical part-based models.

Similarity, a sub-field of Human-Computer Interaction (HCI) designs “games with purpose” to annotate properties of images including attributes and part labels. A well known example is the ESP game [99] where a pair of annotators independently tag images and get rewarded only if the tags match. This makes it competitive encouraging participation and reduces vandalism. Some frameworks discussed in this chapter such as pairwise correspondence for part annotations, describing the differences for attribute discovery, and the Bubbles game, fall into this category. For a good overview of such techniques see the lecture notes by Law and Ahn [53].

We also did not cover methods that discover the structure of objects by analyzing its motion over time. This has been well studied in robotics to discover the kinematic structure of articulated objects [15, 93]. Although this works best at the instance-level, the strategy has been used to discover parts within a category [88].

Finally, a number of recent works discover PnAs within the framework of deep CNNs for fine-grained recognition [12, 57, 111, 112]. Although these methods have been very successful, they bring a new set of challenges. One of them is training models for a new domain when limited labeled data is available. Factorization of the appearance using parts and attributes, either using labels provided explicitly through annotations, or implicitly in the model, continues to be the method of choice for such situations.

References

Amadasun, M., King, R.: Textural features corresponding to textural properties. IEEE Trans. Syst. Man Cybern. 19(5), 1264–1274 (1989)
Article Google Scholar
Bajcsy, R.: Computer description of textured surfaces. Morgan Kaufmann Publishers Inc. (1973)
Google Scholar
Bansal, A., Farhadi, A., Parikh, D.: Towards transparent systems: semantic characterization of failure modes. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Article MathSciNet MATH Google Scholar
Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondences. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
Google Scholar
Berg, T., Berg, A., Shih, J.: Automatic attribute discovery and characterization from noisy web data. European Conference on Computer Vision (ECCV) (2010)
Google Scholar
Bergamo, A., Torresani, L., Fitzgibbon, A.W.: Picodes: Learning a compact code for novel-category recognition. In: Conference on Neural Information Processing Systems (NIPS) (2011)
Google Scholar
Berlin, B., Kay, P.: Basic color terms: their universality and evolution. University of California Press (1991)
Google Scholar
Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: European Conference on Computer Vision (ECCV) (2010)
Google Scholar
Bourdev, L., Maji, S., Malik, J.: Describing people: a poselet-based approach to attribute classification. In: International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Branson, S., Horn, G.V., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. In: British Machine Vision Conference (BMVC) (2014)
Google Scholar
Branson, S., Van Horn, G., Wah, C., Perona, P., Belongie, S.: The ignorant led by the blind: a hybrid human-machine vision system for fine-grained categorization. Int. J. Comput. Vis. (IJCV) 108(1–2), 3–29 (2014)
MathSciNet MATH Google Scholar
Branson, S., Wah, C., Schroff, F., Babenko, B., Welinder, P., Perona, P., Belongie, S.: Visual recognition with humans in the loop. In: European Conference on Computer Vision (ECCV) (2010)
Google Scholar
Broida, T., Chellappa, R.: Estimating the kinematics and structure of a rigid object from a sequence of monocular images. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 6, 497–513 (1991)
Article Google Scholar
Brox, T., Bourdev, L., Maji, S., Malik, J.: Object segmentation by alignment of poselet activations to image contours. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Google Scholar
Caputo, B., Hayman, E., Mallikarjuna, P.: Class-specific material categorisation. In: International Conference on Computer Vision (ICCV) (2005)
Google Scholar
Chao, Y.W., Wang, Z., Mihalcea, R., Deng, J.: Mining semantic affordances of visual object categories. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Chen, H., Gallagher, A., Girod, B.: Describing clothing by semantic attributes. In: European Conference on Computer Vision (ECCV) (2012)
Google Scholar
Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., et al.: Detect what you can: Detecting and representing objects using holistic models and body parts. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Chen, X., Shrivastava, A., Gupta, A.: Neil: Extracting visual knowledge from web data. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describing textures in the wild. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Cimpoi, M., Maji, S., Kokkinos, I., Vedaldi, A.: Deep filter banks for texture recognition, description, and segmentation. Int. J. Comput. Vis. 118(1), 65–94 (2016)
Article MathSciNet Google Scholar
Csurka, G., Dance, C.R., Dan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proc. ECCV Workshop on Statistical Learning in Computer Vision (2004)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
Google Scholar
Dana, K.J., van Ginneken, B., Nayar, S.K., Koenderink, J.J.: Reflectance and texture of real world surfaces. ACM Trans. Graphics 18(1), 1–34 (1999)
Article Google Scholar
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: International Conference on Machine Learning (ICML) (2007)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Deng, J., Krause, J., Fei-Fei, L.: Fine-grained crowdsourcing for fine-grained recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Dhar, S., Ordonez, V., Berg, T.L.: High level describable attributes for predicting aesthetics and interestingness. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Google Scholar
Divvala, S.K., Farhadi, A., Guestrin, C.: Learning everything about anything: Webly-supervised visual concept learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. (IJCV) 111(1), 98–136 (2015)
Article Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Felzenszwalb, P.F., Grishick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2010)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)
Article Google Scholar
Ferrari, V., Zisserman, A.: Learning visual attributes. In: Conference on Neural Information Processing Systems (NIPS) (2007)
Google Scholar
Gionis, A., Indyk, P., Motwani, R., et al.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Data Bases (VLDB) (1999)
Google Scholar
Girshick, R., Iandola, F., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 Object Category Dataset (2007)
Google Scholar
Hayman, E., Caputo, B., Fritz, M., Eklundh, J.O.: On the significance of real-world conditions for material classification. European Conference on Computer Vision (ECCV) (2004)
Google Scholar
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)
Article MATH Google Scholar
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia (2014)
Google Scholar
Juneja, M., Vedaldi, A., Jawahar, C., Zisserman, A.: Blocks that shout: Distinctive parts for scene classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Kovashka, A., Parikh, D., Grauman, K.: WhittleSearch: Image search with relative attribute feedback. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: International Conference on Computer Vision Workshops (ICCVW) (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS) (2012)
Google Scholar
Kumar, N., Berg, A., Belhumeur, P., Nayar, S.: Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 33(10), 1962–1977 (2011)
Article Google Scholar
Kwak, I.S., Murillo, A.C., Belhumeur, P.N., Kriegman, D., Belongie, S.: From bikers to surfers: visual recognition of urban tribes. In: British Machine Vision Conference (BMVC) (2013)
Google Scholar
Lad, S., Parikh, D.: Interactively guiding semi-supervised clustering via attribute-based explanations. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Law, E., Ahn, L.v.: Human computation. Synth. Lect. Artif. Intell. Mach. Learn. 5(3), 1–121 (2011)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation using local affine regions. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 28(8), 2169–2178 (2005)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
van der Maaten, L., Weinberger, K.: Stochastic triplet embedding. In: International Workshop on Machine Learning for Signal Processing (MLSP) (2012)
Google Scholar
Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Maji, S.: Discovering a lexicon of parts and attributes. In: Second International Workshop on Parts and Attributes, ECCV 2012 (2012)
Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Maji, S., Shakhnarovich, G.: Part annotations via pairwise correspondence. In: 4th Workshop on Human Computation, AAAI (2012)
Google Scholar
Maji, S., Shakhnarovich, G.: Part discovery from partial correspondence. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. In: Conference on Neural Information Processing Systems (NIPS) (2002)
Google Scholar
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 24(7), 971–987 (2002)
Article MATH Google Scholar
Ordonez, V., Liu, W., Deng, J., Choi, Y., Berg, A.C., Berg, T.L.: Predicting entry-level categories. Int. J. Comput. Vis. 115(1), 29–43 (2015)
Article MathSciNet Google Scholar
Oxholm, G., Bariya, P., Nishino, K.: The scale of geometric texture. In: European Conference on Computer Vision (ECCV) (2012)
Google Scholar
Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Parikh, D., Grauman, K.: Interactively building a discriminative vocabulary of nameable attributes. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Google Scholar
Parikh, D., Grauman, K.: Relative attributes. In: International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Parikh, D., Kovashka, A., Parkash, A., Grauman, K.: Relative attributes for enhanced human-machine communication. In: Conference on Artificial Intelligence (AAAI) (2012)
Google Scholar
Parikh, D., Zitnick, C.: Human-debugging of machines. In: Second Workshop on Computational Social Science and the Wisdom of Crowds (2011)
Google Scholar
Parizi, S.N., Oberlin, J.G., Felzenszwalb, P.F.: Reconfigurable models for scene recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Parkash, A., Parikh, D.: Attributes for classifier feedback. In: European Conference on Computer Vision (ECCV) (2012)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.V.: Cats and dogs. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Patterson, G., Hays, J.: SUN attribute database: discovering, annotating, and recognizing scene attributes. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Perronnin, F., Dance, C.R.: Fisher kernels on visual vocabularies for image categorization. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: European Conference on Computer Vision (ECCV) (2010)
Google Scholar
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Ramanan, D.: Part-based models for finding people and estimating their pose. In: Visual Analysis of Humans, pp. 199–223. Springer (2011)
Google Scholar
Rao, A.R., Lohse, G.L.: Towards a texture naming system: identifying relevant dimensions of texture. Vis. Res. 36(11), 1649–1669 (1996)
Article Google Scholar
Rastegari, M., Farhadi, A., Forsyth, D.: Attribute discovery via predictable discriminative binary codes. In: European Conference on Computer Vision (ECCV) (2012)
Google Scholar
Razavin, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: DeepVision Workshop (2014)
Google Scholar
Ross, D.A., Tarlow, D., Zemel, R.S.: Learning articulated structure and motion. Int. J. Comput. Vis. 88(2), 214–237 (2010)
Article Google Scholar
Sharan, L., Rosenholtz, R., Adelson, E.H.: Material perception: what can you see in a brief glance? J. Vis. 9:784(8) (2009)
Google Scholar
Simonyan, K., Parkhi, O.M., Vedaldi, A., Zisserman, A.: Fisher vector faces in the wild. In: British Machine Vision Conference (BMVC) (2013)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep Fisher networks for large-scale image classification. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Singh, S., Gupta, A., Efros, A.: Unsupervised discovery of mid-level discriminative patches. In: European Conference on Computer Vision (ECCV) (2012)
Google Scholar
Sturm, J.: Learning kinematic models of articulated objects. In: Approaches to Probabilistic Model Learning for Mobile Manipulation Robots, pp. 65–111. Springer (2013)
Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3d shape recognition. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual perception. IEEE Trans. Syst. Man Cybern. 8(6), 460–473 (1978)
Article Google Scholar
Tamuz, O., Liu, C., Belongie, S., Shamir, O., Kalai, A.T.: Adaptively learning the crowd kernel. In: International Conference on Machine Learning (ICML) (2011)
Google Scholar
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991)
Article Google Scholar
Vedaldi, A., Mahendran, S., Tsogkas, S., Maji, S., Girshick, R., Kannala, J., Rahtu, E., Kokkinos, I., Blaschko, M.B., Weiss, D., Taskar, B., Simonyan, K., Saphra, N., Mohamed, S.: Understanding objects in detail with fine-grained attributes. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Von Ahn, L.: Games with a purpose. Computer 39(6), 92–94 (2006)
Article Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Tech. Rep. CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Wah, C., Horn, G.V., Branson, S., Maji, S., Perona, P., Belongie, S.: Similarity comparisons for interactive fine-grained categorization. In: Computer Vision and Pattern Recognition (2014)
Google Scholar
Wah, C., Maji, S., Belongie, S.: Learning localized perceptual similarity metrics for interactive categorization. In: Winter Conference on Applications of Computer Vision (WACV) (2015)
Google Scholar
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Conference on Neural Information Processing Systems (NIPS) (2006)
Google Scholar
Xing, E.P., Jordan, M.I., Russell, S., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Conference on Neural Information Processing Systems (NIPS) (2002)
Google Scholar
Xu, Y., Ji, H., Fermuller, C.: Viewpoint invariant texture description using fractal analysis. Int. J. Comput. Vis. (IJCV) 83(1), 85–100 (2009)
Article Google Scholar
Yamaguchi, K., Kiapour, M.H., Berg, T.: Paper doll parsing: Retrieving similar styles to parse clothing items. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Yang, Y., Hallman, S., Ramanan, D., Fowlkes, C.C.: Layered object models for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 34(9), 1731–1743 (2012)
Article Google Scholar
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(12), 2878–2890 (2013)
Article Google Scholar
Yu, F.X., Cao, L., Feris, R.S., Smith, J.R., Chang, S.F.: Designing category-level attributes for discriminative visual recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar
Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar
Zhang, N., Paluri, M., Rantazo, M., Darrell, T., Bourdev, L.: Panda: Pose aligned networks for deep attribute modeling. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar

Download references

Acknowledgements

Subhransu Maji acknowledges funding from NSF IIS-1617917 and a UMass Amherst startup grant, and thanks Gregory Shakhnarovich, Catherine Wah, Serge Belongie, Erik Learned-Miller, and Tsung-Yu Lin for helpful discussions.

Author information

Authors and Affiliations

University of Massachusetts, Amherst, USA
Subhransu Maji

Authors

Subhransu Maji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Subhransu Maji .

Editor information

Editors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Rogerio Schmidt Feris
IST Austria Computer Vision and Machine Learning, Klosterneuburg, Austria
Christoph Lampert
Virginia Tech Electrical and Computer Engineering, Blacksburg, Virginia, USA
Devi Parikh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Maji, S. (2017). A Taxonomy of Part and Attribute Discovery Techniques. In: Feris, R., Lampert, C., Parikh, D. (eds) Visual Attributes. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-50077-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-50077-5_10
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50075-1
Online ISBN: 978-3-319-50077-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Taxonomy of Part and Attribute Discovery Techniques

Abstract