Keywords

1 Introduction

In the aerial background, the airplane has extremely high requirements for the ability to effectively detect emergencies and quickly support. In this process, the image classification of aerial targets plays a very important role. With the further improvement of safe flight and the development of airspace national defense, the development of image classification technology of aviation targets is also becoming more and more important in the military and civilian fields.

Aerial target images are easily affected by the environment. For example, different attitudes, different speeds, and different shooting angles will affect the attitude of the target. The traditional image classification method will have certain limitations. At present, the rapid development of traditional image classification methods is based on deep learning classifiers, and the target classifiers of deep learning need to adjust and optimize the network with a large number of sample data. However, in the aviation field, data is difficult to obtain under some special factors, such as data privacy and security, high data labeling costs, and rarely important data. In many application scenarios, there are insufficient conditions to obtain a large number of labeled samples. When the amount of data is very limited, the traditional deep learning target classifier is prone to over-fitting, which makes the image classification ability worse and the result is not ideal. Therefore, learning how to enable a system to effectively learn cognitive abilities from a small number of samples plays a very important role. In recent years, researchers have proposed few-shot learning to solve this kind of problem. For the aviation field, this kind of problem is more prominent, and it is very meaningful to study and summarize it.

This paper studies the few-shot learning technology in image classification, so as to provide some ideas for solving related problems in the aviation field.

In recent years, with the rapid development of deep learning, more few-shot learning methods based on deep learning models continue to emerge. These methods make use of the advantages of deep neural network in feature extraction and representation and provide ideas for solving few-shot problems from different perspectives (Fig. 1).

Fig. 1.
figure 1

Thinking diagram of few-shot learning technology research for image classification

This paper summarizes these few-shot learning technologies, which are mainly divided into three aspects: data-based, model-based and optimization-based. In the second, third and fourth sections of this paper, the few-shot image classification technology in the three research directions in the above figure is described respectively. The fifth section of the paper mainly describes the author's understanding and ideas about the development trend of few-shot learning technology in aerial image classification. Finally, in the sixth section, the research content and main ideas of this paper are summarized.

2 Few-Shot Learning Based on Data

Data enhancement is the most intuitive method to increase the number of training samples and enhance the diversity of data. To some extent, data enhancement can effectively avoid the over-fitting phenomenon of the network model and improve the robustness and generalization ability of the model.

2.1 Ordinary Data Enhancement Methods

For image data, the more commonly used geometric transformation methods are: flip, rotation, cropping, zooming, translation [1] and so on.

In addition, there are noise class transformation methods. Random noise is based on the original picture, randomly superimpose some noise, the main way is Gaussian noise. Coarse Dropout can lose information in a rectangular area of selected size at random positions. Simplex Noise Alpha can generate continuous noise masks and mix with the original image. Frequency Noise Alpha is capable of noise weighting random indices in the Frequency domain and then converting them to the spatial domain, so as to achieve speckle and cloudy effects on the image. In addition, there are fuzzy image processing methods by reducing the difference of each pixel value, including Gaussian blur, randomly erasing an area on the image, marking all edges in the image as black and white and many other methods.

These simple enhancement methods can not effectively improve the generalization ability of the few-shot learning model. Therefore, for some few-shot image classification problems, more complex and effective enhancement models, algorithms and networks may be needed.

2.2 Add Weak-Labeled or Unlabeled Data

The addition of weakly labeled or unlabeled data refers to the enhancement of training data by selecting samples identical or similar to the target image in the weakly labeled or unlabeled data set. Such real weak-labeled or unlabeled data is much easier to obtain in the real world, and costs are relatively low.

Wang [2] proposed a new unsupervised training module to enable the network to learn more unlabeled data in the real world. It improves the network's ability to express more general and rich data, and makes the network not only express specific data sets clearly, but also improve the generalization ability of network models. Boney [3] made Model-Agnostic Meta-Learning (MAML) model adjust the parameters of the embedding function and classifier in the presence of unlabeled data by adding unlabeled data, so as to adapt to the task of new classification and achieve good classification effect. Ren [4] expanded on the basis of the prototype network, increased the network's ability to utilize unlabeled data through training, improved the classifier algorithm and the parameterized model of the algorithm, and achieved better classification accuracy. Liu [5] designed a meta-learning strategy, introduced data sets without labels, and simulated the situation of few-shot data and unlabeled data in the real scene in the training stage, so as to make label prediction for unlabeled data in the case of small amount of data and improve the generalization ability of the model. Hou [6] proposed a novel and simple cross-attention network to solve the problem of small sample image classification, and used the unlabeled query set to expand the support set, so as to make the features of each class more representative. Xie [7] introduced unlabeled data into the classification model, evaluated the objective function of unlabeled data by using KL dispersion, and proposed a Training Signal rolling (TSA) method, which gradually released the training signal with labeled data. Therefore, when the model is trained on more and more unlabeled data, it will not lead to over-fitting, and finally improve the classification accuracy and generalization ability of the model performance. Bateni [8] proposed Transductive Conditional Neural Adaptive Processes, which combines unlabeled data to improve the classification accuracy during testing. This method has the advantages of simplicity and robustness.

By adding weakly labeled and unlabeled data, the network model for few-shot image classification can better adapt to the new type, and improve the generalization ability and classification accuracy of the network.

2.3 Add Generated Data

Adding generated data refers to synthesizing new image data for few-shot categories to expand training data, so that the subsequent network model can achieve more stable and accurate classification and further improve generalization ability. Recently, since Generative Adversarial Nets (GAN) [9] were proposed, research on image generation has developed rapidly (Fig. 2).

Fig. 2.
figure 2

Structure diagram of GAN

GAN is composed of two neural networks, generator and discriminator respectively. The generator turns the input noise into a picture that looks good enough to fool the discriminator. What the discriminator needs to do is to improve its ability to distinguish real samples from generated samples. In this structure of game confrontation, the two constantly improve each other, and finally reach a balance to achieve the goal of generating realistic images.

Mehrotra [10] proposed a new few-shot learning algorithm, which introduced strong regularization terms in the loss function. GAN network was introduced, generator loss and discriminator loss were used as regularization terms to enhance the similarity classification and matching task. The can effectively improve the classification effect. Ali-gombe [11] proposed a GAN-based few-shot image classification method, which classifies images into multiple fake or real classes. This method can classify fake classes and process labeled and unlabeled data simultaneously during training, which further achieves better classification performance. Xian [12] developed a conditional generation model that combined the advantages of variational auto-encoder and GAN to realize explainable image feature learning and improve the accuracy of image classification in the case of scarce training data. Nguyen [13] proposed a new self-supervised few-shot learning method, which uses GAN as the trunk model and increases reconstruction loss and triplet loss during training, so as to improve the performance of discriminator, improve the feature extraction ability of generator, and achieve better classification accuracy. LoFGAN [14] provides a way to generate more data for the few available samples. It randomly divides a small number of images into a basic image and several reference images, replaces the original local features with the closest related local features, and generates more fine-grained, more realistic, more diversified and higher quality images, and improves the stability of the network. Huang [15] proposed a data enhancement method for few-shot infrared airplane, which utilizes the pyramid multi-scale GAN structure to learn and fit the feature information of a single image at different scales. This method improves the generator structure of infrared aircraft image, improves the feature expression of small receptive field, and achieves the enrichment of image details. Liu [16] proposed a lightweight GAN, which consists of two main structures, a channel-level excitation module that skips layers and a self-supervised discriminator, to achieve the few-shot image synthesis task of GAN with the minimum computational cost.

GAN has become the most popular research direction in the field of image generation because of its continuously improving image generation ability in the game environment, which can finally realize the advantage of generating realistic images.

3 Few-Shot Learning Based on Model

3.1 Metric Learning Model

The goal of metric learning is learning similarity measurement, which is a comprehensive measure to evaluate the degree of similarity between two samples (or categories). Several typical few-shot learning methods based on metric learning include Siamese Network, Matching Network, Prototypical Network and Relational Network. These networks use different measure loss functions and various related similarity measures to solve the task of small sample image classification (Fig. 3).

Siamese Network [17] is composed of twin CNNS with the same weight. The two CNNS accept a pair of samples as input and achieve classification according to similarity. The main idea is to map the input to the target space by embedding function and calculate the similarity using simple distance function. Wang [18] proposed an attention-based Siamese Network, designed an efficient convolutional network to learn the embedding function, and used the simple attention function to calculate the similarity of the two feature vectors. This method improves the generalization ability of the classifier. Jadon [19] proposed a kernel-based non-parametric activation function to improve the accuracy and stability of the network by learning appropriate embedding. Shao [20] proposed Meta Siamese Network, which carries out interval training through a flexible framework. The network learns feature embedding and function of depth similarity measurement, and introduces two kinds of distance-based loss to assist optimization.

Fig. 3.
figure 3

Structure diagram of metric learning model

Matching Network [21] embedded image labels into vector space and mapped few-shot data to corresponding labels by constructing the attention mechanism between label semantic features and image features. Mai [22] proposed a feature-level attention mechanism, which enables the similarity function to better reflect the feature differences between classes. It improves the feature extraction capability and robustness of the network. Li [23] proposed a meta-network module that can learn transferable prior knowledge across tasks and directly generate network parameters for similar unknown tasks, so as to achieve fast learning and adaptation to new classes. Chen [24] proposed cascaded feature matching network, in which feature matching blocks can use multi-scale representation to correlate the high correlation between the object and the compared image, so as to achieve better classification effect.

Prototypical Network [25] assumes that there is a prototype for each category in the vector space. Deep neural Network is used to map images into vectors, and the average value of sample vectors belonging to the same category is used as the prototype of the category. The idea is to compare the Euclidean distance between the vector of the query set and the prototype of each category. Fort [26] proposed a novel architecture that uses the uncertainty of a single data point as a weight to carry out relevant distance measurement for categories in the embedding space, improving the accuracy of few-shot classification. Wang [27] proposed a novel architecture that added new modules to the Siamese Network for learning high-quality prototypical representations of each category, further improving classification accuracy. Lim [28] proposed Efficient-PrototypicalNet, which uses pre-trained models as feature extractors to reduce task complexity and also applies knowledge distillation to frameworks.

Relation Network [29] needs to be modeled.It does not use a single fixed distance measurement method, but trains a network to learn the distance measurement method. Hui [30] proposed a self-attention relation network, which introduced attention module to enhance learning features. Finally, the query set and support set were compared in the relational module to achieve accurate classification. Ashrafi [31] focused on the relation network and introduced embedded modules and attention mechanism to achieve the goal of faster generalization and better classification effect in short-term training. Abdelaziz [32] extended relational network to position perception, integrated multi-scale features, measured the similarity between combined features, and realized a model with more robust and better generalization ability.

The model based on metric learning can use a relatively simple way to achieve distance measurement. In recent years, it has developed more mature and achieved better image classification effect, and the technology is relatively stable. The core of measurement learning is to choose the appropriate distance measurement method.

3.2 Meta-learning Model

In few-shot learning, we often face many different new classification tasks. Meta-learning makes use of previous knowledge and experience to guide the learning of new tasks, so that the network has the ability to learn (Fig. 4).

Fig. 4.
figure 4

Structure diagram of meta-learning model

Meta-learning mainly consists of two stages: meta-train and meta-test. In the meta-training stage, the model is exposed to many independent supervised tasks constructed from auxiliary data sets to learn how to adapt to future related tasks. The label space in the meta-test stage does not intersect with the label space in the meta-training period. Data sets are used to adjust the network in order to realize the task of few-shot learning.

Finn [33] proposed MAML framework. The key idea is to train the initial parameters of the model, so that the parameters of the model can only apply to a small number of samples in a new task and show good results after several steps of gradient descent update. Li [34] proposed a meta-learning-stochastic gradient descent model, which can be initialized and adjusted in a step in supervised learning. Compared with other models, it has the advantages of simplicity, easy execution and easy training. Jamal [35] proposed a general task-independent meta-learning model to conduct meta-learning on the unbiased initial model with the maximum uncertainty of the output label, so as to improve the generalization ability of the model. Xu [36] proposed an unsupervised meta-learning model, which uses clustering embedding and data enhancement functions to construct tasks and transfer data between internal and external cycles, so as to solve the problem of insufficient data diversity and achieve a good classification accuracy rate. Cao [37] proposed a new meta-learning method of concepts, which tried to learn new representations with explicable concept dimensions. The learner could learn the mapping of high-level concepts to semi-structured metric spaces, improving the generalization ability of models.

Meta-learning model goes through a lot of meta-training, and the tasks encountered in each training are different, so the model deals with different problems each time. After the training, the meta-learning model is naturally better able to deal with new tasks and ignore the characteristics of specific tasks. Therefore, when encountering a small sample of new tasks, the effect is also good.

4 Few-Shot Learning Based on Optimization

4.1 Parameter Optimization

In the problem of few-shot image classification, the parameters of feature extraction and classification network are very important to improve the accuracy of classification results, and appropriate parameters can make the network produce good classification results. For the research based on network parameter optimization, it is also a research direction of few-shot image classification to find appropriate strategies.

Lee [38] proposed an algorithm based on parameter optimization, which introduced task-related transformation matrix and mask matrix, so that the parameter updating process could be determined according to the requirements of the task. And parameters could be learned in the subspace corresponding to the task, so that the algorithm model could converge quickly and the classification effect was good. Kim [39] combined neural architecture search with meta-learning to train network parameters, and then searched for more complex structures based on good network structure. By repeating such steps, the optimal network structure and parameters were found to achieve better few-shot image classification effect. Dove [40] also used neural architecture search to find the optimal network and proposed a new structure for modifying parameters of different operations, which further improved the adaptability of the network to new tasks. Moreover, a two-level iterative optimization training method is adopted to alleviate the over-fitting problem of few-shot image classification. Elsken [41] proposed a few-shot learning algorithm based on parameter optimization, and introduced the neural architecture search into the few-shot learning algorithm, which realized the learning of initialization parameters and the parameters of network structure.

4.2 Meta-learner Optimization

For the problem of few-shot image classification, some common learners will overfit, because they need millions of times of training to converge, which will affect the correctness of few-shot image classification and the generalization ability of network model. Therefore, it is necessary to consider the problem of optimizing the learner, so that the model can learn efficiently, converge quickly and achieve accurate classification even with only a small number of samples.

Some researchers [42] proposed an accurate optimization algorithm based on Long Short Term Memory (LSTM) meta-learner, which can learn initialization parameters that can realize fast convergence of learner and learn how to select appropriate parameter to update, providing a good parameter update strategy and making the network more suitable for few-shot learning tasks. Antoniou [43] proposed an optimization strategy based on MAML, which made each step of the internal cycle unable to understand the learning rate and batch specification statistics, reduced hyperparameter sensitivity and generalization error, and improved the stability and convergence speed of MAML. Rusu [44] introduced a low-dimensional hidden space on the basis of MAML, realized the inner layer cyclic updating of parameters, and obtained model parameters through random sampling, which could further adapt to the uncertainty in the case of few-shot learning. Bertinetto [45] optimized the classification method in the meta-learner and adopted Ridge Regression and Logistic Regression to improve the generalization ability of the model. Zhang [46] proposed a new optimization framework of prototype based on meta-learning, which regarded the gradient and its flow as meta-knowledge, and then proposed a new meta-optimizer based on Neural Ordinary Differential Equation to optimize the prototype and solve the problem of prototype deviation. Tian [47] proposed meta-contrastive loss as an appropriate meta-regularization to further improve the generalization ability of the meta-learning model.

5 Development Trend and Application for Few-Shot Learning in Image Classification of Aerial Objects

Firstly, the methods listed above for few-shot image classification are studied and summarized, and the comparison results are shown in the following table.

After comprehensively considering the advantages and disadvantages of various few-shot learning methods listed in the table above and their contributions to the accuracy of small sample image classification, the development trend of few-shot learning technology in the field of aerial image classification is summarized below (Table 1).

Table 1. Summary of advantages and disadvantages of various few-shot learning methods

The major problem of the tasks in aviation few-shot image classification is insufficient data, so data enhancement is a very intuitive method. As for the trends of few-shot learning technology for aerial image classification in the future, at the data level, ordinary data enhancement is no longer the focus of research, and other prior knowledge will be used to train the model, or unlabeled data will be better utilized. Although the number of labeled samples in aviation is very small, a large amount of unlabeled data in the real world contains so much information, and the direction of using unlabeled data to train the model is worth further study. Secondly, GAN can generate images to be practically indistinguishable from the real image, and GAN is also suitable for data enhancement of aerial few-shot images. It is the focus of future research to find a stable GAN training strategy and a better feature extraction classification network to avoid pattern collapse and insufficient image richness.

Aiming at the research of model-based few-shot learning in aerial image classification, for the metric learning model, the feature extraction technology of a small number of samples and the appropriate measurement method are the most critical. And selecting the appropriate feature extraction network and comparison network can improve the accuracy of aerial image classification of the model. Secondly, It is also an important direction for further research to choose more effective neural network measurement method with unfixed measurement mode. Meta-learning model is a new trend in recent years, and further research is needed. So different network models are designed in view of different problems, such as introducing attention mechanism, designing appropriate loss function and punishment mechanism, so as to realize the strong generalization ability and the ability to adapt to new tasks proposed in concept. In addition, the explicable exploration of meta-learning network also needs to be continuously understood and studied, so as to better improve the network and training methods according to the internal mechanism.

Aiming at the trend of aerial few-shot image classification based on optimization, the research of parameter optimization for meta-learner will be more in recent years, and the current model is not mature enough, so better meta-learner need to be designed. It will be an important research direction in the future that how to combine parameter optimization with meta-learner optimization and how to design and improve the meta-learner to learn more or more effective meta-knowledge and better initialization parameters.

In general, the three research directions mentioned above are not independent of each other, and the combination of data, model and optimization is also an important trend of future development. The application of few-shot learning technology in the aviation field can achieve high-precision and high-stability model deployment for aviation tasks with little available data, and reduce the high cost of data tags in aviation applications to a certain extent, which is meaningful.

6 Conclusion

To sum up, there are many ways to solve the problem of few-shot image classification, but relevant researches in the aerial field are still very lacking. This paper studies and summarizes the existing few-shot image classification technology, and expounds the trends of few-shot learning technology in the field of aerial image classification. It is hoped that this paper can provide ideas for the application of few-shot learning in the field of aerial image, and then solve practical problems.