Abstract
The traditional supervised learning models rely on high-quality labeled samples heavily. In many fields, training the model on limited labeled samples will result in a weak generalization ability of the model. To address this problem, we propose a novel few-shot image classification method by self-supervised and metric learning, which contains two training steps: (1) Training the feature extractor and projection head with strong representational ability by self-supervised technology; (2) taking the trained feature extractor and projection head as the initialization meta-learning model, and fine-tuning the meta-learning model by the proposed loss functions. Specifically, we construct the pairwise-sample meta loss (ML) to consider the influence of each sample on the target sample in the feature space, and propose a novel regularization technique named resistance regularization based on pairwise-samples which is utilized as an auxiliary loss in the meta-learning model. The model performance is evaluated on the 5-way 1-shot and 5-way 5-shot classification tasks of mini-ImageNet and tired-ImageNet. The results demonstrate that the proposed method achieves the state-of-the-art performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The supervised learning methods rely on a large number of manually labeled samples. In many fields, the lack of labeled samples limits the reliability and generalization ability of the model. Few-shot learning is proposed to address this problem, which aims to enable the model to classify the new classes that do not appear in the training set with only a few annotations [1].
The essence of few-shot learning tasks are to solve cross-domain problems. Chen et al. [2] find that the meta-learning methods lose their advantages when the domain difference is too large. The same view is also put forward in [3], Guo et al. find that there is a little difference in the performance of different meta-learning models in the same domain, but the performance of one meta-learning model in different domains is significantly different. This phenomenon is called a cross-domain problem [4]. At present, many researchers have discovered that learning an excellent feature encoder can greatly improve the model performance [5]. Therefore, more and more researchers focus on learning an initialization feature encoder with strong generalization ability to solve the cross-domain problem.
In this work, we propose a novel few-shot classification framework based on metric learning and self-supervised learning, which consists of a classification model and a meta-learning model. Specifically, the classification model is trained on the base class set to obtain the feature extractor with strong extraction ability, and then this feature extractor is utilized as the initial feature encoder of the meta-learning model to evaluate on the new classes set [6, 7]. The contributions are as follows:
-
We add the rotation self-supervised auxiliary loss into the classification network, which aims to improve the feature representation ability of the model.
-
We propose the meta loss (ML) based on pairwise-samples, which aims to reduce the intra-class difference and increase the inter-class difference.
-
We propose a new regularization technique named resistance regularization, which could improve the generalization ability of the model. The resistance regularization includes the exchange processing and NT-Xent (normalized temperature-scaled cross entropy) loss.
2 Related work
2.1 Meta-learning
Meta-learning could generalize the previously learned knowledge or experience to many new tasks autonomously and quickly [8]. For example, the meta transfer learning (MTL) is proposed in [6], which combines the hard task (HT) meta-batch scheme to force the meta-learner to “grow up in difficulties”. Task-aware feature embedding network (TAFE-Net) is proposed in [9] to obtain the task aware embedding for few-shot classification tasks. Latent embedding optimization (LEO) is introduced in [10], which applies a parameter generation model to capture useful parameters for the tasks. Chen et al. [2] propose that the performance of the model is related to the domain difference, and the performance of the shallow network is better than that of other deep backbones when the domain difference is small. In addition, a new meta-learning method is proposed by dual formulation and KKT conditions in [11] to improve the computational efficiency.
2.2 Metric learning
Metric learning aims to reduce the intra-class difference and increase the inter-class difference, it is widely used in many fileds. Recently, the deep metric learning losses are built on pairwise-samples. For example, a novel hierarchical triplet loss (HTL) is proposed to automatically collect informative training samples in [12]. Riplet center loss is proposed in [13], which could further enhance the distinctiveness of features. Multi-class n-pair loss is proposed in [14] to solve the slow convergence of the contrastive loss and triplet loss. A new angle loss is proposed in [15], which aims to learn valuable features by considering the angle relationship of samples. Wu et al. [16] put forward that the selection of training samples plays an equally important role in the training of the model, and propose a novel sampling method by distance weighting. In our work, we propose the meta loss (ML) based on the pairwise-samples, which is utilized in the meta-learning model to consider the influence of other samples on the target sample in the feature space.
Classification by metric learning is performed in two steps. First, the eigenvector centroid βi of the class i is calculated by formula (1), where K is the number of input samples; secondly, calculating the similarity score pi between the predicted sample x and the centroid in formula (2). The category with the highest similarity score is regarded as the category of the target sample x.
2.3 Self-supervised learning
Self-supervised learning aims to mine the supervision information from large-scale unlabeled data by many auxiliary tasks, which could help the model capture more valuable features. Doersch et al. [17] construct a new auxiliary loss by predicting the context of the image, and the same work has been carried out in [18] and [19]. The auxiliary task of predicting the color of an image is designed in [20, 21], which is utilized to extract the semantic information. Gidaris et al. [22] propose a rotation loss to predict the rotation angle of the image, which could improve the robustness of the model. Hjelm et al. [23] design an auxiliary task to distinguish between the global feature and local feature of the image. Tian et al. [24] propose to construct samples by multi-perspective information. Chen et al. propose SimCLR in [25], it designs the auxiliary tasks by augmenting the input samples. At present, many researchers [26, 27] put forward to combine these self-supervised auxiliary tasks into the classification networks, which could greatly improve the performance of the models. In view of the above mentioned, the rotation self-supervised loss is applied in our classification network to obtain an initialized feature encoder with strong representational ability.
3 Method
3.1 The overall framework
The framework of our self-supervised pairwise-sample resistance model (SPRM) is shown in Fig. 1, which consists of a classification model and a meta-learning model. The fc1 and fc2 are the two classification layers, f𝜃 is the feature encoder, and fϕ is the projection head. l(⋅) is the cross-entropy loss, and the rotation self-supervised loss (\({\mathscr{L}}_{R}\)) is added to the classification model as an auxiliary loss, and the trained feature encoder of the classification model is used as the initial feature encoder of the meta-learning model; the meta loss (\({\mathscr{L}}_{ML}\)) is proposed based on the pairwise-samples, and it is applied in the meta-learning model; inspired by SimCLR, a new regularization technique called resistance regularization (\({\mathscr{L}}_{NT}\)) is proposed for few-shot learning, and it is applied into the ML as a regularization term.
The algorithm flow of SPRM is shown as Algorithm 1, it is divided into two training steps as follows:
Step 1: Training the classification model by \(l(\cdot )+{\mathscr{L}}_{R}\); training the projection head by \(l(\cdot )+ {\mathscr{L}}_{NT}\).
Step 2: Training the meta-learning model by \({\mathscr{L}}_{ML}+ {\mathscr{L}}_{NT}\). The resistance regularization \({\mathscr{L}}_{NT}\) is utilized as the regularization term in the meta loss \({\mathscr{L}}_{ML}\).
3.2 Rotation self-supervised loss
The rotation self-supervised loss [22] is utilized to increase the feature extraction ability and robustness of the classification model. Specifically, rotating the input image at four angles of 0∘, 90∘, 180∘ and 270∘, so the four images can be obtained by one image. The task of the model is to predict the rotation angles of these rotated images. The self-supervised loss function is defined as:
where M is the number of input samples and C is the number of the rotation angles, and C is 4 in our network. xi,j represents the i-th sample with the rotation angle j.
3.3 Meta loss
Wang et al. [28] propose the MS loss combined with the self-similarity S, positive relative similarity P and negative relative similarity N, the calculation is shown as:
where \(\mathcal {P}\) and \(\mathcal {N}\) are the positive and negative samples respectively, xi is the anchor, xk is the k-th sample which needs to be predicted, λ is the similarity threshold, and α and β are the hyperparameters, which are set by experience. α controls the compactness of the positive samples and penalizes the positive samples whose cosine similarity is less than λ; β controls the compactness of negative samples and penalizes the positive samples whose cosine similarity is greater than λ.
Meta-learning aims to predict the query set samples by using the support set samples. However, MS loss sets each sample as the anchor in turn, which will lead to the situation that the samples in the query set are used to predict the other samples, it is incompatible with the meta-learning training paradigm.
According to all the above, we propose the meta loss (ML) for few-shot learning, which only takes the centroid of the support set samples as the anchor. The calculation formula of ML is shown as:
where the N is the number of categories, and μ and η are added in ML to constrain the positive and negative pairwise-samples, see Section 3.3.2 for details. It can be seen from formula (4) and formula (6) that the i of MS traverses all samples from 1 to M, while the a of ML traverses from 1 to N. That is, in the N-way tasks, ML only uses the centroid of the support set samples as the anchor, which not only improves the calculation efficiency of the model, but also satisfies the principle that only the support set samples are used as the anchors in the meta-learning model. ML contains two steps: mining and weighting the pairwise-samples, which are shown in Fig. 2.
3.3.1 Mining the pairwise-samples
ML aims to reduce the computational effort of the model by mining more valuable pairwise-samples, which are the negative pairwise-samples (different categories) with large similarity score and the positive pairwise-samples (the same category) with small similarity score.
Inspired by LMNN [29] and MS loss [28], the positive relative similarity P is used to mine the difficult pairwise-samples by formulas 7 and 8, and the other pairwise-samples with less information are discarded.
where \(S_{a\mathcal {P}}\) and \(S_{a\mathcal {N}}\) are the similarity score maps of positive and negative pairwise-samples, respectively, 𝜖 is the threshold for mining, and \(S_{ai}^{-}\) is the cosine similarity range of the mined negative pairwise-samples, \(S_{aj}^{+}\) is the cosine similarity range of the mined positive pairwise-samples. The mining process of ML is shown in Table 1.
3.3.2 Weighting the pairwise-samples
The valuable pairwise-samples can be roughly mined by positive relative similarity P, and then these valuable pairwise-samples can be further weighted by self-similarity S and negative relative similarity N. Specifically, given a negative pairwise-sample \(\{ x_{a}, x_{i}\}, i \in \mathcal {N}\), the weight wai (partial derivative of Sai in formula 7) is calculated by formula 9, which penalizes the negative samples with cosine similarity > λ, given the positive pairwise-samples \(\{ x_{a}, x_{j}\}, j \in \mathcal {P}\), the weight calculation is shown in formula 10, which penalizes the positive samples with cosine similarity < η. The parameter μ is utilized to adjust the proportion of self-similarity S.
3.4 Resistance regularization
Resistance regularization contains the exchange processing and NT-Xent loss. Before that, we need to construct the pairwise-sample labels.
In a N-way K-shot M-query task, there are N classes, each with K + M images, and a total of N × (K + M) input images. Coping each image, then the 2N × (K + M) input images will be obtained. Regarding the real label of each image, we suppose the N × (K + M) original images are the N × (K + M) different categories. In fact, there are 2 × (K + M) images in each category after copying, but in the pairwise-sample scenario, there are only two images (the original and its copied image) in each category. For example, there is a 3-way 1-shot 1-query meta-learning task, the original input samples are [dog1, dog2, cat1, cat2, pig1, pig2], their coped samples are [Dog1, Dog2, Cat1, Cat2, Pig1, Pig2], so the input data is expanded to [dog1, dog2, cat1, cat2, pig1, pig2, Dog1, Dog2, Cat1, Cat2, Pig1, Pig2] after copying. Pairwise-sample labels are constructed in Fig. 3 (a), we mark the pairwise-sample with the same label as 1, otherwise 0.
3.4.1 Exchange processing
After the pairwise-sample labels are constructed, the labels are fixed and the positions of the 2N × (K + M) input images are exchanged, as shown in Fig. 3 (b). The self-pairwise-sample labels are deleted. After exchanging, for the areas where the pairwise-sample label are not zero, if the real labels of the two samples in one pairwise-sample are the same, it is called soft exchanging (represented as + 1), otherwise it is called hard exchanging (represented as -1). “+ 1” and “-1” are for the convenience of differentiation, and they are regarded as 1 when calculating the NT-Xent loss. In Fig. 3 (b), an example of soft exchanging is the pairwise-sample {dog2, Dog1}; An example of hard exchanging is the pairwise-sample {dog1, Pig2}.
To sum up, both soft and hard exchanging are designed to hinder the further learning of the model. Specifically, taking the pairwise-sample {dog2, Dog1} as an example, the pairwise-sample label of them is “+ 1” and the real labels of them are the same, which allows the model to narrow the intra-class difference. But for the pairwise-sample {dog2, Dog2}, the pairwise-sample label of them is “0” and the real labels of them are the same, which will prevent the model from learning similar characteristics. Therefore, the soft exchanging will correctly increase the similarity of pairwise-samples {dog2, Dog1}; incorrectly decrease the similarity of pairwise-samples {dog2, Dog2}, {dog2, dog1}. Similarly, taking the pairwise-sample {dog1, Pig2} as an example, hard exchanging will correctly decrease the similarity of the pairwise-samples {pig2, cat1}, and incorrectly increase the similarity of the pairwise-samples {dog1, Pig2}, {cat1, Dog2} and {pig1, Cat2}. Both the soft and hard exchanging could correctly decrease the similarity between the two samples whose the real labels are different, but the hard exchanging has a greater hindrance to the training of the model compared with soft exchanging.
The effects of the proportion of the soft and hard exchanging on the model performance are explored in Table 2 by comparative experiments, where “Soft%” represents the proportion of the soft exchanging.
Considering that the appropriate range of soft exchanging proportion can enhance the generalization ability of the model, the soft exchanging proportion of our method is randomly selected between 52.5% and 86.25%.
3.4.2 NT-Xent loss
NT-Xent loss is proposed in SimCLR, the calculation is shown in formula 11. Zi and Zj represent the eigenvectors of an original and its copied image obtained from the feature extractor, respectively; Zk is the eigenvector of the k-th image (k≠i) obtained from the feature extractor. The NT-Xent loss is added as the auxiliary term in the ML.
4 Experiments
4.1 Datasets
Mini-ImageNet
mini-ImageNet [1] dataset is composed of 60000 images selected from ImageNet, a total of 100 categories. There are 600 images in each category, and the size of each image is 84 × 84. It is usually divided into the base class set (64 categories), validation set (16 categories) and new class set (20 categories).
Tiered-ImageNet
tiered-ImageNet [30] dataset is also selected from ImageNet. It contains 34 super-categories, each super-category contains 10-30 classes, a total of 608 classes and 779,165 images. The 34 super-categories can be divided into the base class set (20 super-categories), validation set (6 super-categories) and new class set (8 super-categories).
4.2 Implementation details
The models are trained on the base class set, and then evaluated on the new class set. The model is implemented by Python 3.8 with CUDA 11.0. The two NVIDIA GeForce RTX 2080 Ti GPUs are utilized. Some hyperparameters of the models are shown in Table 3.
The evaluation indicator of this experiment is the confidence interval (z = 1.96) of the average precision P of M samples at the 95% confidence level, i.e. P ± Rinterval. The calculation of the confidence interval radius Rinterval is shown as:
5 Results and discussion
5.1 The performance evaluation of SPRM
In this paper, different few-shot learning methods are compared on mini-ImageNet and tiered-ImageNet dataset. The experimental results are shown in Tables 4 and 5, where “Our-self” is only the classification model with the self-supervised technology, and “Our-self-ML” is the meta-learning model combined with the self-supervised classification model, which uses the ML without adding resistance regularization term.
In Tables 4 and 5, the classification accuracy of SPRM on the 5-way 1-shot task of mini-ImageNet reaches 66.35%, and it reaches 82.24% on the 5-way 5-shot task, which demonstrates the model has better performance than other few-shot learning methods. The classification accuracy of SPRM on the 5-way 1-shot task of tired-ImageNet reaches 70.70%, and it reaches 85.40% on the 5-way 5-shot task. These results show that the SPRM has excellent performance and generalization.
5.2 Ablation study
The effect of the three technologies (including the rotation self-supervised loss, ML and resistance regularization) are studied in the ablation experiments. The results are shown in Table 6. When the resistance regularization is used alone, it can be regarded as a loss function; when the ML and resistance regularization are not used, the cross-entropy loss function is used as the loss function; when these three technologies are not used, the framework does not contain the meta-learning model.
In Table 6, the performance of our proposed method is the best. The application of the rotation self-supervised loss in the classification model can greatly improve the model performance. Compared with the cross-entropy loss, ML has obvious improvement on the classification tasks. When the resistance regularization is used as a loss function alone, it can also increase the prediction accuracy of the few-shot classification model. The ML with the resistance regularization has the best performance when the self-supervised technique is not considered. To sum up, the three proposed techniques all have improved the performance of the model.
In addition, the effect of the mining and weighting technologies of the MS loss and ML are also explored in the ablation study. The evaluation results on mini-ImageNet and tiered-ImageNet dataset are shown in Tables 7 and 8, respectively.
In Tables 7 and 8, the prediction accuracy of ML is the highest, and both the mining and weighting strategy in ML have a positive gain on the model performance. Comparing the results of and in these two tables, the effect of the mining strategy in ML is better than that in MS loss. Comparing the experimental results of and in these two tables, the weighting strategy of ML can improve the model performance in most cases. In addition, according to and , the calculation time of the mining strategy in ML has greatly reduced. To sum up, the mining strategy of ML can not only enhance the model performance, but also improve the computational efficiency.
5.3 Visualization analysis
The heat maps of different methods are visualized by Grad-CAM [56], they are shown in Fig. 4, the feature encoder pays more attention to the warm tone region and ignores the cold tone region. The different methods described in Fig. 4 are shown in Table 9.
In Fig. 4, for “None” and “Self”, when there are many objects in the image, the attention of the feature encoder is easily influenced by the interfering objects; when there are few objects in the image and the target is large, the feature encoder can quickly notice the target, but the attention area is small, the model cannot obtain the complete semantic information. Compared with the feature encoders of “Self” and “None”, SPRM can pay more attention to the whole target and capture more complete semantic information.
The feature vectors of several categories in mini-ImageNet and tiered-ImageNet dataset are shown in Figs. 5 and 6 by UMAP [57], respectively. In Figs. 5 and 6, the feature vectors of the same category are more compact in “Self” than that in “None”, which proves that the rotation self-supervised loss can improve the feature representation ability of the classification model. The same phenomenon occurs in “ML+Rr”, which further confirms that the ML can reduce the intra-class difference and expand the inter-class difference; the SPRM method combines the advantages of the rotation self-supervised loss, ML and resistance regularization to optimize the decision boundary and improve the model performance.
6 Conclusion
In this paper we propose a new few-shot classification model named self-supervised pairwise-sample resistance model (SPRM). It contains a classification model and a meta-learning model. The rotation self-supervised loss is utilized as an auxiliary loss in the classification model to obtain the feature extractor with strong representational ability, which is used as an initialize feature extractor in the meta-learning model; and the meta loss (ML) and resistance regularization are proposed and applied in the meta-learning model to improve the model performance. SPRM is evaluated on the 5-way 1-shot and 5-way 5-shot tasks of mini-ImageNet and tired-ImageNet. The experimental results indicate that our method is superior to the other advanced methods in few-shot classification tasks.
References
Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al. (2016) Matching networks for one shot learning. Adv Neural Inf Process Syst 29
Chen W-Y, Liu Y-C, Kira Z, Wang Y-C F, Huang J-B (2019) A closer look at few-shot classification. arXiv:1904.04232
Guo Y, Codella NC, Karlinsky L, Codella JV, Smith JR, Saenko K, Rosing T, Feris R (2020) A broader study of cross-domain few-shot learning. In: European conference on computer vision, Springer pp 124–141
Tseng H-Y, Lee H-Y, Huang J-B, Yang M-H (2020) Cross-domain few-shot classification via learned feature-wise transformation. arXiv:2001.08735
Jaiswal A, Babu AR, Zadeh MZ, Banerjee D, Makedon F (2020) A survey on contrastive self-supervised learning. Technologies 9(1):2
Sun Q, Liu Y, Chua T-S, Schiele B (2019) Meta-transfer learning for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 403–412
Chen Y, Wang X, Liu Z, Xu H, Darrell T (2020) A new meta-baseline for few-shot learning
Lake B, Salakhutdinov R, Gross J, Tenenbaum J (2011) One shot learning of simple visual concepts. In: Proceedings of the annual meeting of the cognitive science society, vol 33
Wang X, Yu F, Wang R, Darrell T, Gonzalez JE (2019) Tafe-net : Task-aware feature embeddings for low shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1831–1840
Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2018) Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960
Lee K, Maji S, Ravichandran A, Soatto S (2019) Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10657–10665
Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European conference on computer vision (ECCV), pp 269–285
He X, Zhou Y, Zhou Z, Bai S, Bai X (2018) Triplet-center loss for multi-view 3d object retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1945–1954
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Adv Neural Inf Process Syst 29
Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proceedings of the IEEE international conference on computer vision, pp 2593–2601
Wu C-Y, Manmatha R, Smola A J, Krahenbuhl P (2017) Sampling matters in deep embedding learning. In: Proceedings of the IEEE international conference on computer vision, pp 2840– 2848
Doersch C, Gupta A, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE international conference on computer vision, pp 1422–1430
Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: European conference on computer vision, Springer, pp 69–84
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders : feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2536–2544
Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision, Springer, pp 649–666
Zhang R, Isola P, Efros A A (2017) Split-brain autoencoders : unsupervised learning by cross-channel prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1058–1067
Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. arXiv:1803.07728
Hjelm R D, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, bengio Y (2018)
Tian Y, Krishnan D, Isola P (2020) Contrastive multiview coding. In: European conference on computer vision, Springer, pp 776–794
Chen T, Kornblith S, Norouzi M, Hinton G (2020) A simple framework for contrastive learning of visual representations. In: International conference on machine learning, PMLR, pp 1597–1607
Gidaris S, Bursuc A, Komodakis N, Pérez P, Cord M (2019) Boosting few-shot visual learning with self-supervision. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8059–8068
Lee H, Hwang SJ, Shin J (2019) Rethinking data augmentation. Self-supervision and self-distillation
Wang X, Han X, Huang W, Dong D, Scott MR (2019) Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5022–5030
Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10(2)
Ren M, Triantafillou E, Ravi S, Snell J, Swersky K, Tenenbaum JB, Larochelle H, Zemel RS (2018) Meta-learning for semi-supervised few-shot classification. arXiv preprint arXiv:1803.00676
Li K, Zhang Y, Li K, Fu Y (2020) Adversarial feature hallucination networks for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13470–13479
Yu Z, Chen L, Cheng Z, Luo J (2020) Transmatch : a transfer-learning scheme for semi-supervised few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12856–12864
Liu Y, Schiele B, Sun Q (2020) An ensemble of epoch-wise empirical bayes for few-shot learning. In: European conference on computer vision, Springer, pp 404–421
Liu B, Cao Y, Lin Y, Li Q, Zhang Z, Long M, Hu H (2020) Negative margin matters : understanding margin in few-shot classification. In: European conference on computer vision, Springer, pp 438–455
Simon C, Koniusz P, Nock R, Harandi M (2020) Adaptive subspaces for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4136–4145
Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P (2020) Rethinking few-shot image classification : a good embedding is all you need?. In: European conference on computer vision, Springer, pp 266–282
Kim J, Kim H, Kim G (2020) Model-agnostic boundary-adversarial sampling for test-time generalization in few-shot learning. In: European conference on computer vision, Springer, pp 599–617
Ye H-J, Hu H, Zhan D-C, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8808–8817
Dhillon GS, Chaudhari P, Ravichandran A, Soatto S (2019) A baseline for few-shot image classification. arXiv:1909.02729
Zhang C, Cai Y, Lin G, Shen C (2020) Deepemd : few-shot image classification with differentiable earth mover’s distance and structured classifiers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12203–12213
Afrasiyabi A, Lalonde J-F, Gagné C (2020) Associative alignment for few-shot image classification. In: European conference on computer vision, Springer, pp 18–35
Laenen S, Bertinetto L (2021) On episodes, prototypical networks, and few-shot learning. Adv Neural Inf Process Syst 34:24581–24592
Afrasiyabi A, Lalonde J-F, Gagné C (2021) Mixture-based feature space learning for few-shot image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9041–9051
Chen Z, Ge J, Zhan H, Huang S, Wang D (2021) Pareto self-supervised training for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13663–13672
Shen Z, Liu Z, Qin J, Savvides M, Cheng K-T (2021) Partial is better than all : revisiting fine-tuning strategy for few-shot learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 35. pp 9594–9602
Xu W, Xu Y, Wang H, Tu Z (2021) Attentional constellation nets for few-shot learning. In: International conference on learning representations
Zhang H, Koniusz P, Jian S, Li H, Torr PH (2021) Rethinking class relations : absolute-relative supervised and unsupervised few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9432–9441
Hu Y, Gripon V, Pateux S (2021 ) Leveraging the feature distribution in transfer-based few-shot learning. In: International conference on artificial neural networks, Springer, pp 487–499
Qiao S, Liu C, Shen W, Yuille AL (2018) Few-shot image recognition by predicting parameters from activations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7229–7238
Ravichandran A, Bhotika R, Soatto S (2019) Few-shot learning with embedded class models and shot-free meta training. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 331–339
Gidaris S, Komodakis N (2019) Generating classification weights with gnn denoising autoencoders for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 21–30
Li H, Eigen D, Dodge S, Zeiler M, Wang X (2019) Finding task-relevant features for few-shot learning by category traversal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1–10
Xing C, Rostamzadeh N, Oreshkin BO, Pinheiro PO (2019) Adaptive cross-modal few-shot learning. Adv Neural Inf Process Syst 32
Wu F, Smith JS, Lu W, Pang C, Zhang B (2020) Attentive prototype few-shot learning with capsule network-based embedding. In: European conference on computer vision, Springer, pp 237–253
Zhou Z, Qiu X, Xie J, Wu J, Zhang C (2021) Binocular mutual learning for improving few-shot classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 8402–8411
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam : visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
McInnes L, Healy J, Melville J (2018) Umap : uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426
Acknowledgements
This work was funded by the National Natural Science Foundation of China under Grant 51774219, Key R&D Projects in Hubei Province under grant 2020BAB098.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, W., Xie, L., Gan, P. et al. Self-supervised pairwise-sample resistance model for few-shot classification. Appl Intell 53, 20661–20674 (2023). https://doi.org/10.1007/s10489-023-04525-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04525-4