Keywords

1 Introduction

A key complication in Conventional Classifiers is that enormous quantities of labelled samples are required for accurate training and learning. ‘Labels are hard to obtain while unlabelled data are abundant, therefore semi-supervised learning is a good idea to reduce human labour and improve accuracy’ [24]. In our modern world, the data set sizes are for ever increasing but acquiring the label information for these data is a demanding and complicated task. This led to Semi-Supervised Learning gain consequential practical significance.

Automatic classification of personal data is of significant relevance in today’s scenario. The problem is that classification of such data is a challenge as the various categories desired by an individual may not have sufficient labelled instances for training and moreover, the user has to hand label the training data repository. The hand labelling will increasingly become infeasible when the numbers approach millions. We present a novel deep CNN Semi- Supervised learning architecture, using clonal selection techniques for such applications with limited labelled data. The high-level complex features learnt by Deep Models are more resilient and eloquent when compared to shallow classical methods. We harness the potential of Deep Learning in our model and represent the data features as deep features. We have thus developed an innovative generative model, which gives appreciable results when working with unlabelled data along with small-sized labelled data specifically in the domain of personal photo collections.

2 Review

A vast number of semi-supervised techniques for clustering like Nonnegative matrix factorization via constraint propagation [19], Active learning [20], Hierarchical clustering [23], Linear discriminant clustering [10], Kernel mean shift clustering [17], Maximum margin clustering [21] and more [7] are found in literature. A well-structured semi-supervised learning technique was proposed by Fergus et al. [5] and Liu et al. [11] put forth a proposal that clusters billions of images using map reduce.

Many semi-supervised learning methods in literature [25] include generative models like the Self-Training and Expectation–Maximization with mixture models and the Discriminative models like graph-based methods, Gaussian processes and Support vector machines. Expectation–Maximization is prone to local maxima. Unlabelled instances can be detrimental to learning in some cases with these methods like for instance where a local maxima is away from the global maxima. Normally a method is chosen based on its assumptions that best fit the structure of the problem. Self-training is a common and popular approach for semi- supervised learning. Initially, a small quantity of labelled data is used to train a classifier. The trained classifier helps to categorize or label the unlabelled data. Now, the most confident unlabelled data points along with the newly learnt labels are subjoined to the original labelled data set. The new enhanced dataset retrains the classifier yet again. This technique has been successfully applied to many natural language processing problems. Subjective nouns were identified by Riloff et al. [15]. Classification of dialogues with two classifiers was accomplished by Maereizo et al. [12] in 2004. Yet again in 2005 Rosenberg et al. [16] achieved object detection in images using self-training. Though self-training is an algorithm which is hard to analyse, Culp and Michailidis [3] have analysed the convergence of algorithms in this setting. We have used self-training in our work.

Deep hybrid Architectures in semi-supervised environs have been successfully implemented for a multitude of recognition problems. Deep models have surpassed popular shallow architectures especially in image [18] and language [6] domains. Most of these architectures use the greedy approach to pretraining and undergo a multistage generative learning. Various auxiliary approaches have been used to help deep models in early learning and also tackle recalcitrant input variations [22]. Two interesting hybrid semi-supervised deep architecture [13] combine multi-objective learning with efficient layer wise greedy approach for text categorization optical character recognition. But auxiliary free parameters got added introducing additional challenges. These hybrid semi-supervised deep models are promising with good results but it is an undeniable fact that such architectures have their limitations. Further, the test images used so far have been very small in size with neither change in illumination nor background clutter or any other such problems that are inevitable in many natural personal datasets [9]. The kernel methods, comparatively, disregard not only the structure of the input data but also its dimensionality. The flexibility and scalability are also inadequate besides needing large amount of training data.

The Architecture of our model is discussed in the next section. Experiments conducted and Results are in the subsequent sections. Discussion and future work concludes this paper.

3 Architecture of the Integrated Semi-Supervised Learning and Classification Model

We have designed and realized a semi-supervised artificial immune hybrid classifying framework, an SS-AIHC model, presented in Fig. 1. The model consists of series of Convolution and Subsampling layers constituting a deep Convolutional Neural Network (CNN) architecture integrated with Clonal Selection (CS). The softmax layer of our earlier supervised model, the CNN-AIHC [1], is replaced with the Artificial Immune System inspired classifier, AIHC [2]. The complete training of the novel SS-AIHC model resulting in the memory cell maturation process can be divided into three modules as shown in Fig. 2.

Fig. 1
figure 1

Semi-supervised learning

Fig. 2
figure 2

Semi-supervised AIHC model: learning framework

The model is first and foremost trained with the completely labelled data in the Supervised Convolutional-AIS module. The model parameters are further fine-tuned with additional artificially generated data for each class constituting module 2. The Supervised Classification is performed with Artificial test data produced using clonal selection algorithm. Finally, the unlabelled data is used in the module 3, called the semi-supervised stage, to benefit the system further in the training and learning process. All the stages assimilate to mature the memory cells of the novel SS-AIHC framework. A trained SS-AIHC classifier will consist of matured memory cells corresponding to each class obtained from the labelled, clonal and unlabelled data. These memory cells are the set of antibodies representing each class. The memory cell for all class are initialized randomly. The model automatically matures and enhances these memory cells. The CNN generates a distinct pattern for each input sample and the Clonal Selection Algorithm [4] inspires optimal additional data generation. We have used Inner Product to ascertain the affinity between two samples amongst the many measures available like Euclidean Distance, Relative Distance, Manhattan Distance as this measure resulted in best performance.

A deep CNN architecture is used to learn data features and is realized by alternately stacking convolution and sampling layers. Each input sample is convolved with a linear filter, a bias term is added and passed through a non linear function repeatedly to generate its feature map.

$$\begin{aligned} n^k_{ij} = f \Big (\sum _{v}\sum _{x=0}^{X_i -1}\varOmega _{ijv}^{x}\eta _{(i-1)v}^{k+x} + b_{ij}\Big ) \end{aligned}$$
(1)

where \(n^k_{ij}\) is the neuron value in the ith layer of the jth map at kth position. In (1) v is the index of the previous layer, i.e. the layer \((i-1)\) and \(\varOmega _{ijv}^{x}\) represents the weight at position x in the vth feature map. \(X_i\) is the kernel width, \(b_{ij}\) the bias of the current map in the current layer and f is a non linear function like for instance tanh.

The k kernels of the convolutional layer produce k feature maps of size m − n \(+\) 1 where m \(\times \) m is the dimension of the input sample and n \(\times \) n is the size of the kernel. Each map is subsampled with max pooling which provides invariance.

$$\begin{aligned} p_j = \max _{M\times 1}\Big (p_i^{mx1} w(m,1)\Big ) \end{aligned}$$
(2)

where \(p_j\) is the maxima and w(m, 1)is the window function. The pooling layer neuron combines a M \(\times \) 1 patch of the convolutional layer. The entire CNN is trained using the back propagation algorithm. We explain each stage of the learning process in detail in the next sections.

3.1 Supervised Convolutional AIS: Module 1

Since the memory cells of the classifier are initialized randomly hence the initial epoch 1 to \(t_1\) of Fig. 2 uses only the original labelled data to train the deep CNN network. This helps to optimize the population of memory cells toward the best representation of its class. The entire dataset is divided into batches. A batch consists of a number of images. The batch is fed to multiple convolutional and subsampling layers of a deep CNN resulting in the generation of feature map for all images. These feature maps are converted to one-dimensional feature vectors. Finally, an N \(\times \) D vector size where the number of the images taken is N and the dimension of each is D is generated. Borrowing our terminology from the Artificial Immune Systems, we name this the antibody set. From this set one antibody is chosen at a time and is termed antigen. For each antibody in the set execute the following:

{label of \(picked_{antigen}\)= label (antigen (i));

Pick the class corresponding to label of the antigen. Let class is \(class_i\)

Do {

  • Perform the affinity measure, i.e. inner product of chosen antigen with the predetermined antibody set (\(N_i\)) of the class and store the value in the local array.

  • Choose the best \(n_1\) antibody from the antibody set having highest affinity value.

  • Generate additional features using principles of clonal selection. This process yields \(n_2\) number of new antibody.

  • Choose the best \(N_i\) from total of old (\(N_i\)) and new (\(n_2\)).

    }

    }

This process leads to optimal maturation of antibodies in the memory cells of each class using the labelled data only. This is the clonal selection process to optimally augment labelled data. Once the above process of training the classifier is accomplished data can be passed through the AIHC to ascertain their labels (classes). The class which shows maximum affinity is chosen as the output class. The output is compared with the original output. Error is calculated and back propagated. The entire process is presented in Fig. 3. This way both the memory cells, which are the trained representative antibodies of each class, and the kernels at the Convolutional layer gets trained for each class.

Fig. 3
figure 3

Supervised convolutional-AIHC model

3.2 Enhancement of the model with misclassification error: module 2

This process is from epoch no. \(t_1\) to \(t_2\) in Fig. 2. The further training is done using the now somewhat trained and optimally populated supervised CNN-AIHC accomplished in the previous stage. This module is explained in Fig. 4. The Misclassification at the first (Original) output layer is used to produce the additional data to train further at the feature level. The property of convergence is directly related to misclassification.

Fig. 4
figure 4

Enhancement of the Model with Misclassification Error

The entire dataset will have images from each class. The feature set is divided into blocks of the classes. So, one class has \(n_1\) feature vectors each of size d and another class may have \(n_2\) feature vectors of size d. Each feature vector from the feature set are fed to the semi-trained model and misclassification is calculated. This step provides the error corresponding to each feature vector in feature set. Now, we have error corresponding to each feature vector for each class. Based on error, clonal selection and mutation process is performed in each class leading to creation of new additional data.

Clonal Rate \(\propto \) 1/error and

Mutation Rate \(\propto \) 1/error.

The process results in generation of artificial clonal training data based on misclassification. The SS-AIHC model now has new batches along with the original batch. The entire set of data is used to mature the memory cells exactly as explained in the module 1. All batches are given to the model and the error generated is backpropagated as usual. The newly generated training data strengthens the model in its learning and hence improves the accuracy of overall system. Figure 5 illustrates the memory-maturation process.

The model now progresses to its final stage.

Fig. 5
figure 5

Memory-maturation process in semi-supervised convolution—AIS model

3.3 Semi-Supervised Convolution-AIS: Module 3

The module 1 and module 2 are the pretraining stages before the actual semi-supervised stage. The above two modules result in a trained Supervised integrated Convolutional-AIS Classifier using the labelled data and the misclassification error. To further strengthen the learning system, the model now uses the unlabelled data. The module is shown in Fig. 6.

Fig. 6
figure 6

Semi-supervised convolution-AIS module

The following steps are undertaken:

  1. 1.

    Use the Model that has trained with the labelled data and the misclassification error, a task accomplished in the first two modules.

  2. 2.

    Apply this semi-trained model on the unlabelled data and learn their labels.

  3. 3.

    After ascertaining the labels of the unlabelled data, mature the memory cell population of each class using this newly labelled data and the initial labelled data.

  4. 4.

    The framework is now retrained using the entire data.

  5. 5.

    Repeat steps 2–4 till the convergence condition is achieved.

The unlabelled data is hence helping in the learning and training process of model which subsequently results in improving the accuracy. This novel approach of using Convolutional-AIS can address the small data problem in a semi-supervised environment. The model now is a trained SS-AIHC (Semi-Supervised Artificial Immune Hybrid Classifier) which can be used to learn and classify test data.

4 Experiments and Results

We tested the trained SS-AIHC model with the data from personal data collections as well as on standard datasets.

For each test sample do

{

  • Extract the feature of the image using the now trained convolutional and subsampling layer.

  • The extracted feature is compared against the pre populated memory cell of each class following the two layer classification process of our AIHC.

  • Affinity calculation is done with each memory cell.

  • Class having maximum affinity value is chosen as the output class.

  • The photo is rightly classified if the output class matches with its true label.

}

Experiment 1: Results on MNIST dataset: We compared our results in SS learning with the existing results of Pitepis‘s Atals RBF, MTC (manifold tangent classifier) using CAE (contractive auto encoders), TSVM (Transductive SVM), NN (nearest neighbours), CNN (convolutional Neural Networks) [8]. The semi- supervised data set for learning was built by dividing the 50,000 samples between unlabelled and labelled set. The size of the labelled was 100, 500 and 1000, respectively. For higher number of labelled data, in thousands, the accuracy was expectedly higher. Unlike all these models our architecture performs well with smaller number of labelled data and would be having similar order cost as these alternatives. Table 1 tabulates errors of these standard semi-supervised techniques on MNIST data. The Atals RBF follows a two-step approach and is specifically for high-dimensional data. The manifold of data is approximated on the original space using the small dimensional affine charts, completely unsupervised. The second step uses SVM-based supervised learning. The data points are given soft allotments to the affine charts which are low dimensional. The unlabelled data is used to understand the detailed shape of the manifolds underlying, which helps improve the accuracy of the classifier trained with minimal labelled data. Though the method has recorded better results but its ability and accuracy in personal data collections remains unexplored.

Table 1 Error attained with different techniques on MNIST data

Experiment 2: Results on SVHN dataset: We compare the performance of classification on the far more complex image dataset of SVHN with some other techniques [8] from literature. Classification on SVHN dataset for techniques in literature is with 1000 labels. Table 2 shows that our model works well with SVHN data too.

Table 2 Accuracy achieved with different techniques on SVHN data

The optimistic results on the two standard datasets prove the efficacy of our model. Our model records superior performance on the standard datasets using smaller number of labelled data.

Experiment 3: Results on Personal Photos: We have performed experiments on our dataset which is uploaded at https://github.com/vandnabhalla/Database. Table 3 presents the results of the Semi-Supervised AIHC model.

Table 3 Accuracy achieved on personal photo album using SS-AIHC

5 Discussions and Conclusions

The Deep CNN-AIS Semi-Supervised model, our contribution in this work, shows a definite improvement in accuracy on the standard as well as our application datasets. Tables 1 and 2 show the superiority of our model on the standard datasets. Table 3 shows consistency in the model’s performance for a unique data comprising of personal photos. We observe that with sufficiently large amounts of unlabelled data a better classifier can be realized than with just labelled data by itself. As observed our model is able to perform better than most previous methodologies implemented for these environments. There are many applications with abundant availability of the unlabelled data while labelled data is scarce. The Semi-Supervised Hybrid Deep Convolutional–Artificial Immune System Architecture (SS-AIHC) combines the knowledge of a small annotated dataset to build a larger database integrating principles of Clonal Selection from Artificial Immune System with Deep CNN. The learning is subsequently enhanced using unlabelled data too. We need to somehow use the properties of the existent available data to enhance the boundaries of accurate classification decisions. We explore the generative models with semi-supervised learning approach and have developed a new hybrid model that results in efficacious generalization starting from small size hand labelled data. The model is self-learning and the labelled data is augmented from the most confident prognosis. Our experiments show that after augmenting data with Artificial Immune System techniques, deep generative models can bring about considerable enhancement under semi-supervised settings. Our problem is exciting yet exacting for the following reasons:

  • It is arduous to manually hand label any dataset. We have used a personal photo collection as an example dataset in addition to the standard MNIST and SVHN datasets.

  • Clustering such similar datasets with many classes is challenging especially with few labelled instances and large unlabelled data.