Abstract
Labelled data are not only time consuming but often expensive and difficult to procure as it involves skilful inputs by humans to tag and annotate. Contrary to this unlabelled data is comparatively easier to procure but fewer methods exist to optimally use them. Semi-Supervised Learning overcomes this problem and assists to build better classifiers by using unlabelled data along with sufficient labelled data and may actually yield higher accuracy with considerably less human input effort. But if the labelled data set is inadequate in size then the Semi-Supervised techniques are also stuck. We propose a novel framework where the small labelled dataset is appropriately augmented using the intelligent learning mechanisms of artificial immune systems to train the proposed model. The model retrains with the unlabelled data to fortify the learning mechanism. We show that the generative deep framework utilizing artificial immune system principles provides a highly competitive approach for learning in the semi-supervised environment.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
A key complication in Conventional Classifiers is that enormous quantities of labelled samples are required for accurate training and learning. ‘Labels are hard to obtain while unlabelled data are abundant, therefore semi-supervised learning is a good idea to reduce human labour and improve accuracy’ [24]. In our modern world, the data set sizes are for ever increasing but acquiring the label information for these data is a demanding and complicated task. This led to Semi-Supervised Learning gain consequential practical significance.
Automatic classification of personal data is of significant relevance in today’s scenario. The problem is that classification of such data is a challenge as the various categories desired by an individual may not have sufficient labelled instances for training and moreover, the user has to hand label the training data repository. The hand labelling will increasingly become infeasible when the numbers approach millions. We present a novel deep CNN Semi- Supervised learning architecture, using clonal selection techniques for such applications with limited labelled data. The high-level complex features learnt by Deep Models are more resilient and eloquent when compared to shallow classical methods. We harness the potential of Deep Learning in our model and represent the data features as deep features. We have thus developed an innovative generative model, which gives appreciable results when working with unlabelled data along with small-sized labelled data specifically in the domain of personal photo collections.
2 Review
A vast number of semi-supervised techniques for clustering like Nonnegative matrix factorization via constraint propagation [19], Active learning [20], Hierarchical clustering [23], Linear discriminant clustering [10], Kernel mean shift clustering [17], Maximum margin clustering [21] and more [7] are found in literature. A well-structured semi-supervised learning technique was proposed by Fergus et al. [5] and Liu et al. [11] put forth a proposal that clusters billions of images using map reduce.
Many semi-supervised learning methods in literature [25] include generative models like the Self-Training and Expectation–Maximization with mixture models and the Discriminative models like graph-based methods, Gaussian processes and Support vector machines. Expectation–Maximization is prone to local maxima. Unlabelled instances can be detrimental to learning in some cases with these methods like for instance where a local maxima is away from the global maxima. Normally a method is chosen based on its assumptions that best fit the structure of the problem. Self-training is a common and popular approach for semi- supervised learning. Initially, a small quantity of labelled data is used to train a classifier. The trained classifier helps to categorize or label the unlabelled data. Now, the most confident unlabelled data points along with the newly learnt labels are subjoined to the original labelled data set. The new enhanced dataset retrains the classifier yet again. This technique has been successfully applied to many natural language processing problems. Subjective nouns were identified by Riloff et al. [15]. Classification of dialogues with two classifiers was accomplished by Maereizo et al. [12] in 2004. Yet again in 2005 Rosenberg et al. [16] achieved object detection in images using self-training. Though self-training is an algorithm which is hard to analyse, Culp and Michailidis [3] have analysed the convergence of algorithms in this setting. We have used self-training in our work.
Deep hybrid Architectures in semi-supervised environs have been successfully implemented for a multitude of recognition problems. Deep models have surpassed popular shallow architectures especially in image [18] and language [6] domains. Most of these architectures use the greedy approach to pretraining and undergo a multistage generative learning. Various auxiliary approaches have been used to help deep models in early learning and also tackle recalcitrant input variations [22]. Two interesting hybrid semi-supervised deep architecture [13] combine multi-objective learning with efficient layer wise greedy approach for text categorization optical character recognition. But auxiliary free parameters got added introducing additional challenges. These hybrid semi-supervised deep models are promising with good results but it is an undeniable fact that such architectures have their limitations. Further, the test images used so far have been very small in size with neither change in illumination nor background clutter or any other such problems that are inevitable in many natural personal datasets [9]. The kernel methods, comparatively, disregard not only the structure of the input data but also its dimensionality. The flexibility and scalability are also inadequate besides needing large amount of training data.
The Architecture of our model is discussed in the next section. Experiments conducted and Results are in the subsequent sections. Discussion and future work concludes this paper.
3 Architecture of the Integrated Semi-Supervised Learning and Classification Model
We have designed and realized a semi-supervised artificial immune hybrid classifying framework, an SS-AIHC model, presented in Fig. 1. The model consists of series of Convolution and Subsampling layers constituting a deep Convolutional Neural Network (CNN) architecture integrated with Clonal Selection (CS). The softmax layer of our earlier supervised model, the CNN-AIHC [1], is replaced with the Artificial Immune System inspired classifier, AIHC [2]. The complete training of the novel SS-AIHC model resulting in the memory cell maturation process can be divided into three modules as shown in Fig. 2.
The model is first and foremost trained with the completely labelled data in the Supervised Convolutional-AIS module. The model parameters are further fine-tuned with additional artificially generated data for each class constituting module 2. The Supervised Classification is performed with Artificial test data produced using clonal selection algorithm. Finally, the unlabelled data is used in the module 3, called the semi-supervised stage, to benefit the system further in the training and learning process. All the stages assimilate to mature the memory cells of the novel SS-AIHC framework. A trained SS-AIHC classifier will consist of matured memory cells corresponding to each class obtained from the labelled, clonal and unlabelled data. These memory cells are the set of antibodies representing each class. The memory cell for all class are initialized randomly. The model automatically matures and enhances these memory cells. The CNN generates a distinct pattern for each input sample and the Clonal Selection Algorithm [4] inspires optimal additional data generation. We have used Inner Product to ascertain the affinity between two samples amongst the many measures available like Euclidean Distance, Relative Distance, Manhattan Distance as this measure resulted in best performance.
A deep CNN architecture is used to learn data features and is realized by alternately stacking convolution and sampling layers. Each input sample is convolved with a linear filter, a bias term is added and passed through a non linear function repeatedly to generate its feature map.
where \(n^k_{ij}\) is the neuron value in the ith layer of the jth map at kth position. In (1) v is the index of the previous layer, i.e. the layer \((i-1)\) and \(\varOmega _{ijv}^{x}\) represents the weight at position x in the vth feature map. \(X_i\) is the kernel width, \(b_{ij}\) the bias of the current map in the current layer and f is a non linear function like for instance tanh.
The k kernels of the convolutional layer produce k feature maps of size m − n \(+\) 1 where m \(\times \) m is the dimension of the input sample and n \(\times \) n is the size of the kernel. Each map is subsampled with max pooling which provides invariance.
where \(p_j\) is the maxima and w(m, 1)is the window function. The pooling layer neuron combines a M \(\times \) 1 patch of the convolutional layer. The entire CNN is trained using the back propagation algorithm. We explain each stage of the learning process in detail in the next sections.
3.1 Supervised Convolutional AIS: Module 1
Since the memory cells of the classifier are initialized randomly hence the initial epoch 1 to \(t_1\) of Fig. 2 uses only the original labelled data to train the deep CNN network. This helps to optimize the population of memory cells toward the best representation of its class. The entire dataset is divided into batches. A batch consists of a number of images. The batch is fed to multiple convolutional and subsampling layers of a deep CNN resulting in the generation of feature map for all images. These feature maps are converted to one-dimensional feature vectors. Finally, an N \(\times \) D vector size where the number of the images taken is N and the dimension of each is D is generated. Borrowing our terminology from the Artificial Immune Systems, we name this the antibody set. From this set one antibody is chosen at a time and is termed antigen. For each antibody in the set execute the following:
{label of \(picked_{antigen}\)= label (antigen (i));
Pick the class corresponding to label of the antigen. Let class is \(class_i\)
Do {
-
Perform the affinity measure, i.e. inner product of chosen antigen with the predetermined antibody set (\(N_i\)) of the class and store the value in the local array.
-
Choose the best \(n_1\) antibody from the antibody set having highest affinity value.
-
Generate additional features using principles of clonal selection. This process yields \(n_2\) number of new antibody.
-
Choose the best \(N_i\) from total of old (\(N_i\)) and new (\(n_2\)).
}
}
This process leads to optimal maturation of antibodies in the memory cells of each class using the labelled data only. This is the clonal selection process to optimally augment labelled data. Once the above process of training the classifier is accomplished data can be passed through the AIHC to ascertain their labels (classes). The class which shows maximum affinity is chosen as the output class. The output is compared with the original output. Error is calculated and back propagated. The entire process is presented in Fig. 3. This way both the memory cells, which are the trained representative antibodies of each class, and the kernels at the Convolutional layer gets trained for each class.
3.2 Enhancement of the model with misclassification error: module 2
This process is from epoch no. \(t_1\) to \(t_2\) in Fig. 2. The further training is done using the now somewhat trained and optimally populated supervised CNN-AIHC accomplished in the previous stage. This module is explained in Fig. 4. The Misclassification at the first (Original) output layer is used to produce the additional data to train further at the feature level. The property of convergence is directly related to misclassification.
The entire dataset will have images from each class. The feature set is divided into blocks of the classes. So, one class has \(n_1\) feature vectors each of size d and another class may have \(n_2\) feature vectors of size d. Each feature vector from the feature set are fed to the semi-trained model and misclassification is calculated. This step provides the error corresponding to each feature vector in feature set. Now, we have error corresponding to each feature vector for each class. Based on error, clonal selection and mutation process is performed in each class leading to creation of new additional data.
Clonal Rate \(\propto \) 1/error and
Mutation Rate \(\propto \) 1/error.
The process results in generation of artificial clonal training data based on misclassification. The SS-AIHC model now has new batches along with the original batch. The entire set of data is used to mature the memory cells exactly as explained in the module 1. All batches are given to the model and the error generated is backpropagated as usual. The newly generated training data strengthens the model in its learning and hence improves the accuracy of overall system. Figure 5 illustrates the memory-maturation process.
The model now progresses to its final stage.
3.3 Semi-Supervised Convolution-AIS: Module 3
The module 1 and module 2 are the pretraining stages before the actual semi-supervised stage. The above two modules result in a trained Supervised integrated Convolutional-AIS Classifier using the labelled data and the misclassification error. To further strengthen the learning system, the model now uses the unlabelled data. The module is shown in Fig. 6.
The following steps are undertaken:
-
1.
Use the Model that has trained with the labelled data and the misclassification error, a task accomplished in the first two modules.
-
2.
Apply this semi-trained model on the unlabelled data and learn their labels.
-
3.
After ascertaining the labels of the unlabelled data, mature the memory cell population of each class using this newly labelled data and the initial labelled data.
-
4.
The framework is now retrained using the entire data.
-
5.
Repeat steps 2–4 till the convergence condition is achieved.
The unlabelled data is hence helping in the learning and training process of model which subsequently results in improving the accuracy. This novel approach of using Convolutional-AIS can address the small data problem in a semi-supervised environment. The model now is a trained SS-AIHC (Semi-Supervised Artificial Immune Hybrid Classifier) which can be used to learn and classify test data.
4 Experiments and Results
We tested the trained SS-AIHC model with the data from personal data collections as well as on standard datasets.
For each test sample do
{
-
Extract the feature of the image using the now trained convolutional and subsampling layer.
-
The extracted feature is compared against the pre populated memory cell of each class following the two layer classification process of our AIHC.
-
Affinity calculation is done with each memory cell.
-
Class having maximum affinity value is chosen as the output class.
-
The photo is rightly classified if the output class matches with its true label.
}
Experiment 1: Results on MNIST dataset: We compared our results in SS learning with the existing results of Pitepis‘s Atals RBF, MTC (manifold tangent classifier) using CAE (contractive auto encoders), TSVM (Transductive SVM), NN (nearest neighbours), CNN (convolutional Neural Networks) [8]. The semi- supervised data set for learning was built by dividing the 50,000 samples between unlabelled and labelled set. The size of the labelled was 100, 500 and 1000, respectively. For higher number of labelled data, in thousands, the accuracy was expectedly higher. Unlike all these models our architecture performs well with smaller number of labelled data and would be having similar order cost as these alternatives. Table 1 tabulates errors of these standard semi-supervised techniques on MNIST data. The Atals RBF follows a two-step approach and is specifically for high-dimensional data. The manifold of data is approximated on the original space using the small dimensional affine charts, completely unsupervised. The second step uses SVM-based supervised learning. The data points are given soft allotments to the affine charts which are low dimensional. The unlabelled data is used to understand the detailed shape of the manifolds underlying, which helps improve the accuracy of the classifier trained with minimal labelled data. Though the method has recorded better results but its ability and accuracy in personal data collections remains unexplored.
Experiment 2: Results on SVHN dataset: We compare the performance of classification on the far more complex image dataset of SVHN with some other techniques [8] from literature. Classification on SVHN dataset for techniques in literature is with 1000 labels. Table 2 shows that our model works well with SVHN data too.
The optimistic results on the two standard datasets prove the efficacy of our model. Our model records superior performance on the standard datasets using smaller number of labelled data.
Experiment 3: Results on Personal Photos: We have performed experiments on our dataset which is uploaded at https://github.com/vandnabhalla/Database. Table 3 presents the results of the Semi-Supervised AIHC model.
5 Discussions and Conclusions
The Deep CNN-AIS Semi-Supervised model, our contribution in this work, shows a definite improvement in accuracy on the standard as well as our application datasets. Tables 1 and 2 show the superiority of our model on the standard datasets. Table 3 shows consistency in the model’s performance for a unique data comprising of personal photos. We observe that with sufficiently large amounts of unlabelled data a better classifier can be realized than with just labelled data by itself. As observed our model is able to perform better than most previous methodologies implemented for these environments. There are many applications with abundant availability of the unlabelled data while labelled data is scarce. The Semi-Supervised Hybrid Deep Convolutional–Artificial Immune System Architecture (SS-AIHC) combines the knowledge of a small annotated dataset to build a larger database integrating principles of Clonal Selection from Artificial Immune System with Deep CNN. The learning is subsequently enhanced using unlabelled data too. We need to somehow use the properties of the existent available data to enhance the boundaries of accurate classification decisions. We explore the generative models with semi-supervised learning approach and have developed a new hybrid model that results in efficacious generalization starting from small size hand labelled data. The model is self-learning and the labelled data is augmented from the most confident prognosis. Our experiments show that after augmenting data with Artificial Immune System techniques, deep generative models can bring about considerable enhancement under semi-supervised settings. Our problem is exciting yet exacting for the following reasons:
-
It is arduous to manually hand label any dataset. We have used a personal photo collection as an example dataset in addition to the standard MNIST and SVHN datasets.
-
Clustering such similar datasets with many classes is challenging especially with few labelled instances and large unlabelled data.
References
Bhalla, V., Chaudhury, S.: Artificial immune hybrid photo album classifier. In: Proceedings of International Conference on Computer Vision and Image Processing CVIP 2016, vol. 1 (2016)
Bhalla, V., Chaudhury, S., Jain, A.: A novel hybrid cnn-ais visual pattern recognition engine, pp. 215–224. Springer International Publishing, Cham (2015)
Culp, M., Michailidis, G.: An iterative algorithm for extending learners to a semi-supervised setting. J. Comput. Graph. Stat. 17(3), 545–571 (2008)
De Castro, L.N., Von Zuben, F.J.: Learning and optimization using the clonal selection principle. IEEE Trans. Evol. Comput. 6(3), 239–251 (2002)
Fergus, R., Weiss, Y., Torralba, A.: Semi-supervised learning in gigantic image collections. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, Curran Associates, Inc., pp. 522–530 (2009)
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Getoor, L., Scheffer, T. (eds.) ICML, Omnipress, pp. 513–520 (2011)
Jiao, L.C., Shang, F., Wang, F., Liu, Y.: Fast semi-supervised clustering with enhanced spectral embedding. Pattern Recogn. 45(12), 4358–4369 (2012)
Kingma, D.P., Mohamed, S., Rezende, D.J., Welling, M.: Semi-supervised learning with deep generative models. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, Curran Associates, Inc., pp. 3581–3589 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Lee, C.H., Liu, C.L., Hsaio, W.H., Gou, F.S.: Semi-supervised linear discriminant clustering. IEEE Trans. Cybern. 44(7), 9891000 (July 2014)
Liu, T., Rosenberg, C., Rowley, H.A.: Clustering billions of images with large scale nearest neighbor search (2007)
Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue data. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions (Stroudsburg, PA, USA), ACLdemo ’04, Association for Computational Linguistics (2004)
Ororbia II, A.G., Reitter, D., Wu, J., Lee Giles, C.: Online learning of deep hybrid architectures for semi-supervised categorization. In: Machine Learning and Knowledge Discovery in Databases—European Conference, ECML PKDD, Porto, Portugal, September 7–11, 2015. Proceedings, Part I, 2015, pp. 516–532 (2015)
Pitelis, N., Russell, C., Agapito,L.: Semi-supervised Learning Using an Unsupervised Atlas, pp. 565–580. Springer, Berlin, Heidelberg (2014)
Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4 (Stroudsburg, PA, USA), CONLL ’03, Association for Computational Linguistics, pp. 25–32 (2003)
Rosenberg, C., Hebert, M., Schneiderman, H.: Semi-supervised self-training of object detection models. In: WACV/MOTION, pp. 29–36. IEEE Computer Society (2005)
Tuzel, O., Anand, S., Mittal, S., Meer, P.: Semi-supervised kernel mean shift clustering. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1201–1215 (June 2014)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
Wang, D., Gao, X., Wang, X.: Semi-supervised nonnegative matrix factorization via constraint propagation. IEEE Trans. Cybern. 46(1), 233–244 (2016)
Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. IEEE Trans. Knowl. Data Eng. 26(1), 43–54 (2013)
Zeng, H., Cheung, Y.-M.: Semi-supervised maximum margin clustering with pairwise constraints. IEEE Trans. Knowl. Data Eng. 24(5), 926–939 (2012)
Zhang, J., Tian, G., Mu, Y., Fan, W.: Supervised deep learning with auxiliary networks. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA), KDD ’14, pp. 353–361. ACM (2014)
Zheng, L., Li, T.: Semi-supervised hierarchical clustering. In: Proceedings of the IEEE 11th International Conference on Data Mining, p. 982991 (2011)
Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison (2005)
Zhu, X., Goldberg, A.B., Brachman, R., Dietterich, T.: Introduction to Semi-supervised Learning. Morgan and Claypool Publishers (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhalla, V., Chaudhury, S. (2020). Integrated Semi-Supervised Model for Learning and Classification. In: Chaudhuri, B., Nakagawa, M., Khanna, P., Kumar, S. (eds) Proceedings of 3rd International Conference on Computer Vision and Image Processing. Advances in Intelligent Systems and Computing, vol 1022. Springer, Singapore. https://doi.org/10.1007/978-981-32-9088-4_16
Download citation
DOI: https://doi.org/10.1007/978-981-32-9088-4_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-32-9087-7
Online ISBN: 978-981-32-9088-4
eBook Packages: EngineeringEngineering (R0)