Practical Aspects of Zero-Shot Learning

Saad, Elie; Paprzycki, Marcin; Ganzha, Maria

doi:10.1007/978-3-031-08754-7_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13351))

Included in the following conference series:

International Conference on Computational Science

1078 Accesses
2 Citations

Abstract

Zero-shot learning is applied, for instance, when properly labeled training data is not available. A number of zero-shot algorithms have been proposed. However, since none of them seems to be an “overall winner”, development of a meta-classifier(s) combining “best aspects” of individual classifiers can be attempted. In this context, state-of-the-art zero-shot learning methods are compared for standard benchmark datasets. Next, multiple meta-classifiers are applied to the same datasets.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Survey on Meta-learning Based Few-Shot Classification

Review of Few-Shot Learning in the Text Domain and the Image Domain

Few-Shot Classification with Multi-task Self-supervised Learning

Keywords

1 Introduction and Literature Review

Many real-world applications require classifying “entities” not encountered earlier, e.g., object recognition (where every object is a category), cross-lingual dictionary induction (where every word is a category), etc. Here, one of the reasons is lack of resources to annotate available (and possibly systematically growing) datasets. To solve this problem, zero-shot learning has been proposed.

While multiple zero-shot learning approaches have been introduced ([9, 19]), as of today, none of them emerged as “the best”. In situations like this, meta-classifiers, which “receive suggestions” from individual classifiers and “judge” their value to select a “winner”, can be explored. The assumption is that such meta-classifiers will perform better than the individual ones.

Let us start from the formal problem formulation. Given a dataset of image embeddings \(\mathcal {X}=\{(x_i,y_i)\in \mathcal {X}\times \mathcal {Y}|i=1,...,N_{tr}+N_{te}\}\), each image is a real D-dimensional embedding vector comprised of features \(x_i\in \mathbb {R}^D\), and each class label is represented by an integer \(y_i\in \mathcal {Y}\equiv \{1,...,N_0,N_0+1,...,N_0+N_1\}\) giving \(N_0+N_1\) distinct classes. Here, for generality, it is assumed that . The dataset \(\mathcal {X}\) is divided into two subsets: (1) training set and (2) test set. The training set is given by \(X^{tr}=\{(x_i^{tr},y_i^{tr})\in \mathcal {X}\times \mathcal {Y}_0|i=1,...,N_{tr}\}\), where \(y_i^{tr}\in \mathcal {Y}_0\equiv \{1,...,N_0\}\), resulting in \(N_0\) distinct training classes. The test set is given by \(X^{te}=\{(x_i^{te},y_i^{te})\in \mathcal {X}\times \mathcal {Y}_1|i=N_{tr}+1,...,N_{te}\}\), where \(y_i^{te}\in \mathcal {Y}_1\equiv \{N_0+1,...,N_0+N_1\}\) providing \(N_1\) distinct test classes.

The goal of zero-shot learning is to train a model (on dataset \(X^{tr}\)) that performs “well” for the test dataset \(X^{te}\). Obviously, zero-shot learning requires auxiliary information associating labels from the training and the test sets, when \(\mathcal {Y}_0\cap \mathcal {Y}_1=\emptyset \). The solution is to represent each class label y \((1\le y\le N_0+N_1)\) by its prototype (semantic embedding). Here, \(\pi (\cdot ):\mathcal {Y}_0\cup \mathcal {Y}_1\rightarrow \mathcal {P}\) is the prototyping function, and \(\mathcal {P}\) is the semantic embedding space. The prototype vectors are such that any two class labels \(y_0\) and \(y_1\) are similar if and only if their prototype representations \(\pi (y_0)=p_0\) and \(\pi (y_1)=p_1\) are close in the semantic embedding space \(\mathcal {P}\). For example, their inner product is large in \(\mathcal {P}\), i.e., \(\langle \pi (y_0),\pi (y_1)\rangle _\mathcal {P}\) is large. Prototyping all class labels into a joint semantic space, i.e., \(\{\pi (y)|y\in \mathcal {Y}_0\cup \mathcal {Y}_1\}\), results in labels becoming related. This resolves the problem of disjoint class sets, and the model can learn from the labels in the training set, and predict labels from the test set.

Multiple algorithms have been proposed to solve the zero-shot learning problem. Here, DeViSE [6], ALE [2], and SJE [3] use a bilinear compatibility function. They follow the Stochastic Gradient Descent (SGD), implicitly regularized by early stopping. The ESZSL [12] uses square loss to learn the bilinear compatibility function, and explicitly defines regularization with respect to the Frobenius norm. Kodirov et al. in [8] proposes a semantic encoder-decoder model (SAE), where the training instances are projected into the semantic embedding space \(\mathcal {P}\), with the projection matrix W, and then projected back into the feature space \(\mathcal {X}\), with the conjugate transpose of the projection matrix \(W^*\). Another group of approaches adds a non-linearity component to the linear compatibility function [18]. Third set of approaches uses probabilistic mappings [9]. Fourth group of algorithms expresses the input image features and the semantic embeddings as a mixture of seen classes [21]. In the fifth approach, both seen and unseen classes are included in the training data [20].

In this context, a comparison of five state-of-the-art zero-shot learning approaches, applied to five popular benchmarking datasets, is presented. Next, explorations into meta-classifier for zero-shot learning are reported. Extended version of this work, with additional details and results, can be found in [14].

2 Selection of Methods and Experimental Setup

Based on the analysis of the literature, five robust zero-shot learning approaches were selected: (1) DeViSE, (2) ALE, (3) SJE, (4) ESZSL, and (5) SAE. Moreover, the following, popular in the literature, datasets have been picked: (a) CUB [17], (b) AWA1 [9], (c) AWA2 [19], (d) aPY [5], and (e) SUN [11]. Finally, five standard meta-classifiers have been tried: (A) Meta-Decision Tree MDT [16], (B) deep neural network DNN [7], (C) Game Theory-based approach GT [1], (D) Auction-based model Auc [1], (E) Consensus-based approach Con [4], and (F) a simple majority voting MV [13]. Here, classifiers (C), (D), (E) and (F) have been implemented following cited literature. However, the implementation of (A) differs from the one described in [16] by not applying the weight condition on the classifiers. However, effects of this simplification can be explored int the future. Finally, the DNN has two hidden layers and an output layer. All of them use the rectified linear activation function. The optimization function is the SGD, with the mean squared error loss function. All codes and complete list of hyperparameter values for the individual classifiers and the meta-classifiers can be found in the Github repository^{Footnote 1}. While hyperparameter values, were obtained through multiple experiments, no claim about their optimality is made. The following standard measures have been used to measure the performance of the explored approaches: (T1) Top-1, (T5) Top-5, (LogLoss) Logarithmic Loss, and (F1) F1 score. Their definitions can be found in [10, 15, 19].

Separately, comparison with results reported in [19] has to be addressed. To the best of our knowledge, codes used there are not publicly available. Thus, the best effort was made to re-implement methods from [19]. As this stage, the known differences are: (1) feature vectors and semantic embedding vectors, provided in the datasets were used, instead of the calculated ones; (2) dataset splits for the written code follow the splits proposed in [19], instead of the “standard splits”. Nevertheless, we believe that the results, presented herein, fairly represent the work reported in [19].

3 Experiments with Individual Classifiers

The first set of experimental results was obtained using the five classifiers applied to the five benchmark datasets. Results displayed in Table 1 show those available from [2, 3, 6, 8, 12] (in the O column). The R column represents results based on [19]. The I columns represents the in-house implementations of the five models. Overall, all classifiers performed “badly” when applied to the aPY dataset. Next, comparing the results between columns R and I, out of 25 results, methods based on [19] are somewhat more accurate in 15 cases. Hence, since performances are close, and one could claim that our implementation of methods from [19] is “questionable”, from here on, only results based on “in house” implementations of zero-shot learning algorithms are reported.

Table 1. Individual classifier performance for the Top-1 accuracy

Full size table

While the Top-1 performance measure is focused on the “key class”, other performance measures have been tried. In [14] performance measured using Top 5, LogLoss, and F1 score have been reported. Overall, it can be stated that (A) different performance measures “promote” different zero-shot learning approaches; (B) aPY is the “most difficult dataset” regardless of the measure; (C) no individual classifier is clear winner. Therefore, a simplistic method has been proposed, to gain a better understanding of the “overall strength” of each classifier. However, what follows “should be treated with a grain of salt”. Here, individual classifiers score points ranging from 5 to 1 (from best to worst) based on the accuracy obtained for each dataset. This process is applied to all four accuracy measures. Combined scores have been reported in Table 2. Interestingly, SAE is the weakest method for both the individual datasets and the overall performance. The combined performance of ALE, SJE, and EZSL is very similar.

Table 2. Individual classifier combined performance; “winners” marked in bold font.

Full size table

Table 3. Analysis of Instance Difficulty (represented in %)

Full size table

3.1 Analysis of the Datasets

Since it became clear that the performance of the classifiers is directly related to the datasets, their “difficulty” has been explored. Hence, an instance (in a dataset) is classified as lvl 0 if no classifier identified it correctly, whereas lvl 5 if it was recognized by all classifiers. The results in Table 3, show how many instances (in %) belong to each category, for each dataset. Here, the aPY dataset has the largest percent of instances that have not been recognized at all (36.37%), or by one or two classifiers (jointly, 41.51%). At the same time, only 0.85% of its instances have been recognized by all classifiers. The SUN dataset is the easiest: 27.85% of instances were correctly recognized by all classifiers and about 55% of its instances are “relatively easy”.

Approaching the issue from different perspective, the “influence” of individual attributes has been “scored”. For each correctly recognized instance, its attributes have been given +1 “points”. For incorrectly recognized instances, their attributes were given −1 “points”. This measure captured which attributes are the easiest/hardest to recognize. Obtained results can be found in Table 4. The most interesting observation is that attributes: “has eye color black” and “metal” are associated with so many instances that they are classified (in)correctly regardless if they actually “influenced” the “decision” of the classifier.

Table 4. Analysis of the datasets

Full size table

4 Meta-Classifiers

Let us now move to meta-classifiers. Here, note that the number of hard instances, found in each dataset (see, Sect. 2), establishes the hard ceiling for: DNN, MDT, and MV. Specifically, if not a single classifier gave a correct answer, in these approaches, their combination will also “fail”. In Table 5, the performance of the six meta-classifiers is compared using the Top-1 measure, where the Best row denotes the best result obtained by the “winning” individual classifier, for a given dataset (see, Table 1). Results using the F1 score, can be found in [14].

Table 5. Meta-classifier performance; Top-1 accuracy

Full size table

Comparing the results, one can see that: (a) the best individual classifier performed better than the best meta-classifier on CUB, AWA2, and aPY (2.91%, 0.88%, and 3.59% better); (b) the best meta-classifier performed better than the best individual classifier on AWA1 and SUN datasets (0.08% and 0.08% better).

Table 6. Meta-classifier and individual classifier combined performance

Full size table

Finally, the “scoring method” was applied jointly to the meta-classifiers and the individual classifiers, for the Top-1 and the F1 score accuracy measures. Obviously, since 11 classifiers were compared, the top score was 11 points. Table 6 displays the results. It can be noticed that (1) meta-classifiers performed better than the individual classifiers (averaging 77.83 vs. 74.6 points). (2) Combining results from the individual classifiers using a simple majority voting algorithm brought best results. At the same time, (3) use of basic versions of more advanced meta-classifiers is not leading to immediate performance gains.

5 Concluding Remarks

In this contribution, performance of five zero-shot learning models has been studied, when applied to popular benchmarking datasets. Moreover, the “nature of difficulty” of these datasets has been explored. Finally, six standard meta-classifiers have been experimented with. The main findings were: (1) there is no single best classifier, and results depend on the dataset and the performance measure; (2) the aPY dataset is the most difficult for zero-shot learning; (3) standard meta-classifiers may bring some benefits; (4) the simplest methods obtained best results (i.e., the individual classifier ESZSL and the meta-classifier MV). The obtained prediction accuracy (less than 70%) suggests that a lot of research is needed for both the individual classifiers and, possibly, the meta-classifiers. Moreover, datasets similar to the aPY, which are particularly difficult achieve good performance, should be used. Finally, some attention should be devoted to the role that individual attributes play in class (instance) recognition.

Notes

1.
https://github.com/Inars/Developing_MC_for_ZSL.

References

Abreu, M.d.C., Canuto, A.M.: Analyzing the benefits of using a fuzzy-neuro model in the accuracy of the neurage system: an agent-based system for classification tasks. In: Proceedings of IEEE International Joint Conference on NN, pp. 2959–2966. IEEE (2006)
Google Scholar
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1425–1438 (2015)
Article Google Scholar
Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2927–2936 (2015)
Google Scholar
Alzubi, O.A., Alzubi, J.A.A., Tedmori, S., Rashaideh, H., Almomani, O.: Consensus-based combining method for classifier ensembles. Int. Arab J. Inf. Technol. 15(1), 76–86 (2018)
Google Scholar
Farhadi, A., Endres, I., Hoiem, D., Forsyth, D.: Describing objects by their attributes. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1785. IEEE (2009)
Google Scholar
Frome, A., et al.: Devise: a deep visual-semantic embedding model. Adv. Neural Inf. Proc. Sys. 26 (2013)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Google Scholar
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3174–3183 (2017)
Google Scholar
Lampert, C., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 36(3) (2013)
Google Scholar
Mannor, S., Peleg, D., Rubinstein, R.: The cross entropy method for classification. In: Proceedings of the 22nd International Conference on ML, pp. 561–568 (2005)
Google Scholar
Patterson, G., Hays, J.: Sun attribute database: discovering, annotating, and recognizing scene attributes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2751–2758. IEEE (2012)
Google Scholar
Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: International Conference on ML, pp. 2152–2161. PMLR (2015)
Google Scholar
Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63–81 (2005)
Article Google Scholar
Saad, E., Paprzycki, M., Ganzha, M.: Practical aspects of zero-shot learning. 10.48550/ARXIV.2203.15158, https://arxiv.org/abs/2203.15158
Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, F-Score and ROC: a family of discriminant measures for performance evaluation. In: Sattar, A., Kang, B. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006). https://doi.org/10.1007/11941439_114
Chapter Google Scholar
Todorovski, L., Džeroski, S.: Combining classifiers with meta decision trees. Mach. Learn. 50(3), 223–249 (2003)
Article Google Scholar
Welinder, P., et al.: Caltech-UCSD birds 200 (2010)
Google Scholar
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 69–77 (2016)
Google Scholar
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2018)
Google Scholar
Ye, M., Guo, Y.: Zero-shot classification with discriminative semantic representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7140–7148 (2017)
Google Scholar
Zhang, Z., Saligrama, V.: Zero-shot learning via semantic similarity embedding. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4166–4174 (2015)
Google Scholar

Download references

Acknowledgement

Research funded in part by the Centre for Priority Research Area Artificial Intelligence and Robotics of Warsaw University of Technology within the Excellence Initiative: Research University (IDUB) programme.

Author information

Authors and Affiliations

Warsaw University of Technology, Warsaw, Poland
Elie Saad & Maria Ganzha
Systems Research Institute Polish Academy of Sciences, Warsaw, Poland
Marcin Paprzycki

Authors

Elie Saad
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Paprzycki
View author publications
You can also search for this author in PubMed Google Scholar
Maria Ganzha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Ganzha .

Editor information

Editors and Affiliations

Brunel University London, London, UK
Derek Groen
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saad, E., Paprzycki, M., Ganzha, M. (2022). Practical Aspects of Zero-Shot Learning. In: Groen, D., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2022. ICCS 2022. Lecture Notes in Computer Science, vol 13351. Springer, Cham. https://doi.org/10.1007/978-3-031-08754-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-08754-7_12
Published: 15 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08753-0
Online ISBN: 978-3-031-08754-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Practical Aspects of Zero-Shot Learning

Abstract

Similar content being viewed by others

A Survey on Meta-learning Based Few-Shot Classification

Review of Few-Shot Learning in the Text Domain and the Image Domain

Few-Shot Classification with Multi-task Self-supervised Learning

Keywords

1 Introduction and Literature Review

2 Selection of Methods and Experimental Setup

3 Experiments with Individual Classifiers

3.1 Analysis of the Datasets

4 Meta-Classifiers

5 Concluding Remarks

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Practical Aspects of Zero-Shot Learning

Abstract

Similar content being viewed by others

A Survey on Meta-learning Based Few-Shot Classification

Review of Few-Shot Learning in the Text Domain and the Image Domain

Few-Shot Classification with Multi-task Self-supervised Learning

Keywords

1 Introduction and Literature Review

2 Selection of Methods and Experimental Setup

3 Experiments with Individual Classifiers

3.1 Analysis of the Datasets

4 Meta-Classifiers

5 Concluding Remarks

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation