Unsupervised Feature Extraction – A CNN-Based Approach

Trosten, Daniel J.; Sharma, Puneet

doi:10.1007/978-3-030-20205-7_17

Daniel J. Trosten¹² &
Puneet Sharma¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11482))

Included in the following conference series:

Scandinavian Conference on Image Analysis

2705 Accesses
5 Citations

Abstract

Working with large quantities of digital images can often lead to prohibitive computational challenges due to their massive number of pixels and high dimensionality. The extraction of compressed vectorial representations from images is therefore a task of vital importance in the field of computer vision. In this paper, we propose a new architecture for extracting such features from images in an unsupervised manner, which is based on convolutional neural networks. The model is referred to as the Unsupervised Convolutional Siamese Network (UCSN), and is trained to embed a set of images in a vector space, such that local distance structure in the space of images is approximately preserved. We compare the UCSN to several classical methods by using the extracted features as input to a classification system. Our results indicate that the UCSN produces vectorial representations that are suitable for classification purposes.

You have full access to this open access chapter, Download conference paper PDF

Information-Based Learning of Deep Architectures for Feature Extraction

Convolutional Neural Network with Multi-column Characteristics Extraction for Image Classification

Principal Feature Visualisation in Convolutional Neural Networks

1 Introduction

An inherent property of digital images is their large, and often varying dimensionality. A typical image can contain thousands, or even millions of pixels, meaning that the curse of dimensionality [2] can be devastating if not handled properly. This fact has led to the development of a multitude of techniques for computing compressed representations of images, that attempt to retain important information in the representation. Moreover, it is often required that for a database of images, the pairwise similarities between images are approximately preserved through the extraction process. This is vital for the performance of image classification systems, which rely on the quantitative viewpoint provided by a collection of representations. In general, it is easier to design and train a classification system which uses less complex decision functions (e.g. linear), as opposed to systems that rely on more complex decision functions. Hence, a feature extraction technique which is suitable for classification should transform an image database such that the categories one wishes to distinguish form compact and linearly separable “clusters” in the feature space.

Many of the widely used feature extraction techniques are based on the idea of one or more image descriptors, that can be computed either over the whole image (globally), or from certain subregions of the image (locally). An example of a global descriptor is the Histogram of Oriented Gradients (HOG), which has been successfully applied to human recognition and tracking [8]. Examples of local descriptors include the Scale Invariant Feature Transform (SIFT) [15], Speeded Up Robust Features (SURF) [1], and Gradient Location and Orientation Histogram (GLOH) [17].

With these local descriptors, the main idea is to identify points of interest in the input image, and then compute vectorial descriptors based on neighborhoods around these points. The SIFT algorithm identifies keypoints by locating minima or maxima in a scale space, which is constructed by convolving the input image with discretized Laplacian of Gaussians at different scales. The SURF algorithm is based on a similar approach, in which the scale space is constructed using box-filters of varying widths, instead of Gaussians.

Once a set of local descriptors has been computed for an image, these have to be further quantized into a single vector representing the entire image. This can be achieved using the Bag of Visual-words (BOV) [7], in which all descriptors from the entire dataset are clustered with $ k $-means, thus creating a vocabulary of visual-words. The feature vector for a given image is then the histogram produced when assigning each of its descriptors to one of the visual-words. Another approach to quantization is the Improved Fisher Vector (IFV) [11], which fits a Gaussian mixture model to the complete set of descriptors. Then, for a given image, the feature vector is constructed based on the average gradient of the mixture’s log-likelihood, evaluated at each of the descriptors extracted from the image.

Recently, the image processing community has seen a shift in methodology with the introduction of models based on deep learning. The convolutional neural network (CNN) [14] has received much attention due to advancements in training strategies and computational capacity [13]. Although a vast majority of the CNN-based models are applied to supervised problems, there are no direct restrictions within the CNN architecture on performing unsupervised tasks. In other words, a modification in the training regime can be adopted to train a CNN-based model for feature extraction in an unsupervised manner.

In an unsupervised feature-extracting CNN, the learned feature vector – and therefore also its quality with respect to the task at hand will depend on the large number of parameters contained in the network. The development of training strategies, and more specifically, loss functions, for training deep learning architectures in an unsupervised manner is a field of research which still is in its early stages. Notable contributions within this field include loss functions designed for joint feature learning and clustering [26, 27], as well as the Convolutional Autoencoder (CAE) [16].

The CAE consists of two CNNs, referred to as the encoder and decoder, respectively. The task of the encoder is to embed the input image in a vector space, while the task of the decoder is to reconstruct the input image based only on the embedded representation. These networks are trained together to minimize the mean squared reconstruction error, causing the encoder to learn a mapping which aims to preserve as much information about the input image as possible. When the training process terminates, the decoder is discarded and the encoder can be used as a feature extractor.

In this article, we propose the Unsupervised Convolutional Siamese Network (UCSN). It is a new deep learning based feature extractor which consists of a convolutional neural network that is trained end-to-end to learn an adaptive neighborhood embedding in an unsupervised manner, similarly to the weakly supervised Siamese Networks [6]. In contrast to the convolutional autoencoder [16], our model does not rely on a decoder network during training.

The rest of the paper is structured as follows: In Sect. 2, we provide an in-depth explanation of the proposed architecture. In Sect. 3, we explain the details of our experiments. In Sects. 4 and 5 we present and discuss our results, respectively. Finally, we provide some concluding remarks in Sect. 6.

2 Method

In this section we introduce the concept of siamese networks, and provide a detailed explanation of our proposed Unsupervised Convolutional Siamese Network.

2.1 Siamese Networks

Feature extraction can be regarded as learning a mapping from some input space $ X $ to an output space $ Y $:

$$\begin{aligned} f_{\varvec{\theta }} : X \rightarrow Y \subseteq \mathbb {R}^D \end{aligned}$$

where $ f $ denotes the parameterized mapping, and $ \varvec{\theta }$ is a vector of parameters. $ D $ denotes the dimensionality of the extracted features, and depends on the design of the function $ f $. In a siamese network [6, 10, 20], $ f $ is parameterized by either a fully-connected neural network or by a convolutional neural network. In both these cases, the final layer is responsible for producing the extracted feature. The chosen network is trained to minimize the contrastive loss function:

$$\begin{aligned} \mathcal {L}_c = \sum \limits _{i=1}^{n-1}\sum \limits _{j=i+1}^{n} \left( a_{ij} ||\varvec{y}_i - \varvec{y}_j ||^2 + (1-a_{ij}) \max (0, c - ||\varvec{y}_i - \varvec{y}_j ||)^2 \right) , \end{aligned}$$

(1)

where $ c $ is a hyperparameter, $ ||\cdot || $ denotes the Euclidean norm on $ Y $, $ \varvec{y}_i = f_{\varvec{\theta }}(\varvec{x}_i) $ is the feature extracted from the input observation $ \varvec{x}_i $, and $ a_{ij} $ is an indicator variable defined as

$$ a_{ij} = {\left\{ \begin{array}{ll} 1, \quad \, (i,j) \,\mathrm{is\,a\,positive\,pair} \\ 0, \quad \, (i,j) \, \mathrm{is\,a\,negative\,pair} \end{array}\right. }. $$

Two observations constitute a positive pair if they belong to the same class or category. Minimizing the contrastive loss function in Eq. (1) causes the network to learn a discriminative embedding by minimizing the distance within embedded positive pairs, while ensuring that the distance within negative pairs is sufficiently large. Thus, the network learns to embed points from the same class close to each other, while ensuring a large distance between points from different classes. The dependency on the $ a_{ij} $ in the loss function means that a siamese network requires weakly labeled data in the training process, making it a supervised model.

2.2 The Unsupervised Convolutional Siamese Network

To eliminate the need for weakly labeled data, we propose to construct positive and negative pairs based on a distance function in the input space. An overview of the architecture is provided in Fig. 1. The CNN receives as input a set of $ n $ images $ \varvec{x}_1, \dots , \varvec{x}_n \in X $, and produces the learned features $ \varvec{y}_1, \dots , \varvec{y}_n \in Y $. The model is trained by minimizing the loss function in Eq. (1), however the $ a_{ij} $ are now constructed in an unsupervised manner.

More specifically, the $ a_{ij} $ are constructed by labeling a pair as positive if the distance between them is less than $ \varepsilon $, and negative otherwise:

$$ a_{ij} = {\left\{ \begin{array}{ll} 1, \quad d(\varvec{x}_i, \varvec{x}_j) < \varepsilon \\ 0, \quad d(\varvec{x}_i, \varvec{x}_j) \ge \varepsilon \end{array}\right. }, $$

where $ d $ is a suitable symmetric distance function on $ X $, and $ \varepsilon $ was chosen such that the probability of two random training points being marked as a positive pair was approximately $ 0.02 $. This choice resulted in well-separated embeddings.

Note that this modification was previously done in [21], but they only considered the pixel-wise Euclidean distance. To construct a more robust distance function, we compute the Euclidean distance between HOG descriptors extracted from the respective images:

$$\begin{aligned} d(\varvec{x}_i, \varvec{x}_j) = || \text {HOG}(\varvec{x}_i) - \text {HOG}(\varvec{x}_j) ||. \end{aligned}$$

3 Experiment Setup

3.1 Classifier

To quantitatively assess the performance of the extracted features with respect to classification, we train a linear support vector machine (SVM) [3, 22] to classify the learned features. The linear SVM was chosen as it is a linear classifier which is fast to train, and has been successfully applied to many classification tasks [23], making it a suitable classification component in the benchmarking process. We use a one-vs-rest SVM from the scikit-learn Python module, with default hyperparameters.

3.2 Models

In our experiments we evaluate two different versions of the UCSN. UCSN-Euc uses the Euclidean distance to determine positive and negative pairs, while UCSN-HOG uses the distance between HOG-features. We compare the proposed models to several state-of-the-art methods for image feature extraction [5]:

SIFT-BOV: SIFT descriptors encoded with the bag of visual words (BOV) encoding, with a vocabulary size of $ 4096 $ visual words. The $ 128 $ dimensional descriptors were transformed to $ 80 $ dimensional vectors using PCA, before running $ k $-means. The dimensionality reduction has been empirically found to improve the classification accuracy in similar models [4]. Each of the transformed descriptors were augmented with the normalized position of their respective keypoints, i.e. $ (x/W, y/H) $, where $ (x,y) $ is the position of the keypoint in the image, and $ (W,H) $ is the width and height of the image, respectively.
SIFT-IFV: SIFT descriptors encoded using the improved Fisher vector (IFV) encoding with $ 256 $ mixture components. Similarly to SIFT-BOV, the descriptors were reduced in dimensionality and augmented before fitting the Gaussian mixture model.
SURF-BOV: Same as SIFT-BOV, but with SURF descriptors instead. Note that the $ 128 $ dimensional extended SURF descriptor was used. Dimensionality reduction with PCA, and position augmentation was done before running $ k $-means.
SURF-IFV: Same as SIFT-IFV, but with the $ 128 $ dimensional extended SURF descriptor. Dimensionality reduction with PCA, and position augmentation was done before fitting the Gaussian mixture model.
HOG: The HOG descriptor.
CAE: A fully convolutional autoencoder as described in [16]. Note that we train the model end-to-end instead of layer-wise, and use standard upsampling layers in the decoder, instead of the pooling scheme suggested in [16]. These modifications were made to reduce the computational complexity associated with training the model.

The dimensionality of the extracted features for the models can be found in Table 1, and the complete list of hyperparameter values for all models can be found in Table 4, Appendix A.

Note that the dimensionality of the extracted features is very large for some of the models and datasets. For the models based on local descriptors, the dimensionality can be adjusted by tuning either the number of clusters in BOV, or the number of mixture components in IFV. However, as pointed out in [4], values lower than 4096 clusters, or 256 mixture components, can lead to a drop performance.

Table 1. Dimensionalities of the extracted features. The last column indicates whether the dimensionality is a hyperparameter in the model.

Full size table

3.3 Datasets

The datasets chosen for evaluation are listed in Table 2. These constitute a range of object recognition tasks, without introducing auxiliary challenges, like multiple labels, large amounts of noise, and very high resolutions. Each dataset was randomly split into a training set, validation set, and test-set. $ 80\ \% $ of the images were used for training, $ 10\ \% $ were used for hyperparameter validation, while the last $ 10\ \% $ were used for performance evaluation. Note that the training set was used to train both the feature extractors and the corresponding SVMs.

Table 2. Properties of the different datasets used for evaluation.

Full size table

3.4 Implementation

The experiments were implemented using the Python programming language, with the NumPy and SciPy modules for numerical computations. The HOG, SIFT and SURF descriptors were computed using the OpenCV module, while BOV and IFV were implemented on top of the KMCuda and scikit-learn modules, respectively. The SVM from the scikit-learn module was used, while the UCSN and the CAE were implemented and trained using the TensorFlow framework. Both these models were trained for 200 iterations, using stochastic mini-batches of size $ 1024 $ and the Adam optimizer [12].

Table 3. Classification accuracies for the different models and datasets.

Full size table

4 Results

The classification accuracies obtained from the different models and datasets are shown in Table 3. These indicate that the performance of the UCSN-HOG is comparable to the benchmark models. For the SVHN dataset, we observe that both UCSN-HOG and HOG outperform the rest of the models. This indicates that the HOG descriptors are suitable for discriminating between classes, and that the UCSN-HOG is able to exploit this information when extracting its features. This is not observed with the UCSN-Euc model, which performs significantly worse than all other models.

For the Fashion-MNIST dataset, UCSN-Euc is not as far behind, but still performs worse than UCSN-HOG, HOG, and the CAE. This again implies that these three models are able to extract features with better discriminative properties. On the other hand, the results for the Cats and Dogs dataset show that all deep learning based models perform worse than the other benchmark methods. Interestingly, this is opposite to what is commonly observed for fully supervised models, where CNN-based classifiers regularly outperform the classical approaches [5, 13].

To further evaluate the validity of our results, the Fashion-MNIST features were projected to two dimensions using $ t $-SNE [24] for selected models. Figure 2 shows the transformed features. These plots reveal a large difference in class separability, with the SURF-IFV features being significantly less separable than the other three. The classes are more separated by the CAE’s representation, however, the features provided by HOG and UCSN-HOG still show better class separability. There is also a noticeable similarity between the HOG features and the UCSN-HOG features, which is expected since the UCSN-HOG is trained to preserve local HOG distances.

Figure 3 shows image pairs extracted from the SVHN and Cats and Dogs datasets, respectively. The normalized distances in the tables to the right are computed as $ d_\text {norm} = \frac{d}{d_{\text {max}}} $ where $ d $ is the distance between feature vectors extracted from the two images for the given model, and $ d_\text {max} $ is the maximum observed distance between any two extracted feature in the dataset, for the given model.

The pair in Fig. 3a shows two images that are both labeled as “$ 1 $”. Note that the distance between UCSN-HOG embeddings is small, even though the image backgrounds are quite dissimilar, and the rightmost image contains an additional “$ 4 $” which acts as a distraction. This indicates that UCSN-HOG has learned to recognize the “$ 1 $” in the images. However, in Fig. 3b the cats depicted in the images are quite similar, but the distance between UCSN-HOG embeddings is large. This shows that UCSN-HOG is unable to pick up the similarities between these two images. This is also the case for most of the benchmark models, with the exception of UCSN-Euc and SURF-BOV.

5 Discussion

Although the CNN-based models, i.e. CAE and our proposed UCSN, perform well on the SVHN and Fashion-MNIST datasets, this was not observed with the Cats and Dogs dataset, where both these models performed worse than the other benchmark methods. This indicates that, when trained in an unsupervised manner, the filters learned by the CAE and the UCSN do not produce sufficiently discriminative representations in the output space. It is possible that this problem can be alleviated by employing different initialization techniques, for instance, through transfer learning [19], where the CNN is pre-trained as a classifier on a different dataset where ground truth labels are available. After pre-training, the network can be fine-tuned to the task at hand using the loss function in Eq. (1). Another option is to consider different distance functions in the image space, as this governs the local similarity structure of the extracted features. We emphasize that the proposed architecture can work with any symmetric distance function, making it applicable to a diverse set of domains. The exploration of these potential modifications is left as future work.

6 Conclusions

In this article, we outline a new architecture for image feature extraction that is based on convolutional neural networks. The proposed Unsupervised Convolutional Siamese Network (UCSN) is trained to embed a set of images in a vector space such that local distance structure is approximately preserved. We compare the UCSN model to several classical models by using the extracted features as input to a classification system. The overall results obtained from three different datasets indicate that the UCSN produces vectorial representations that are suitable for classification purposes. In the end, we suggest possible techniques that can be used for improving the performance of UCSN for feature extraction applications.

References

Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intel. Syst. Technol. 2(3), 1–27 (2011). https://doi.org/10.1145/1961189.1961199
Article Google Scholar
Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: Procedings of the British Machine Vision Conference 2011, pp. 76.1–76.12. British Machine Vision Association, Dundee (2011). https://doi.org/10.5244/C.25.76
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv:1405.3531 [cs], May 2014
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 539–546, June 2005. https://doi.org/10.1109/CVPR.2005.202
Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: In Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1–22 (2004)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE, San Diego (2005). https://doi.org/10.1109/CVPR.2005.177
Elson, J., Douceur, J.J., Howell, J., Saul, J.: Asirra: A CAPTCHA that exploits interest-aligned manual image categorization. In: Proceedings of 14th ACM Conference on Computer and Communications Security (CCS). Association for Computing Machinery, Inc., October 2007
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR 2006), vol. 2, pp. 1735–1742. IEEE, New York (2006). https://doi.org/10.1109/CVPR.2006.100
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_11
Chapter Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Lecun, Y.: Generalization and network design strategies. In: Pfeifer, R., Schreter, Z., Fogelman, F., Steels, L. (eds.) Connectionism in perspective. Elsevier (1989)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_7
Chapter Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 16 (2005)
Article Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: Neural Information Processing Systems (NIPS), p. 9 (2011)
Google Scholar
Pratt, L.Y., Mostow, J., Kamm, C.A.: Direct transfer of learned information among neural networks. In: Proceedings of the Ninth National Conference on Artificial Intelligence, AAAI 1991, vol. 2, pp. 584–589. AAAI Press, Anaheim (1991)
Google Scholar
Shaham, U., Lederman, R.R.: Learning by coincidence: Siamese networks and common variable learning. Pattern Recogn. 74, 52–63 (2018). https://doi.org/10.1016/j.patcog.2017.09.015
Article Google Scholar
Shaham, U., Stanton, K., Li, H., Basri, R., Nadler, B., Kluger, Y.: SpectralNet: spectral clustering using deep neural networks. In: International Conference on Learning Representations (2018)
Google Scholar
Smola, A.J., Schölkopf, B.: A Tutorial on Support Vector Regression (2004)
Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Elsevier Academic Press, Amsterdam (2009). oCLC: 550588366
MATH Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms, August 2017
Google Scholar
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Proceedings of the 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 478–487, June 2016
Google Scholar
Yang, J., Parikh, D., Batra, D.: Joint unsupervised learning of deep representations and image clusters. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5147–5156, June 2016. https://doi.org/10.1109/CVPR.2016.556

Download references

Author information

Authors and Affiliations

UiT Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, Tromsø, Norway
Daniel J. Trosten
Department of Engineering and Safety (IIS-IVT), UiT The Arctic University of Norway, Tromsø, Norway
Puneet Sharma

Authors

Daniel J. Trosten
View author publications
You can also search for this author in PubMed Google Scholar
Puneet Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel J. Trosten .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Michael Felsberg
Linköping University, Linköping, Sweden
Per-Erik Forssén
Uppsala University, Uppsala, Sweden
Ida-Maria Sintorn
Linköping University, Linköping, Sweden
Jonas Unger

A Hyperparameters

Table 4. A summary of all the hyperparameters in all the models. Please consult the OpenCV documentation for an explanation of the HOG, SIFT and SURF hyperparameters.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trosten, D.J., Sharma, P. (2019). Unsupervised Feature Extraction – A CNN-Based Approach. In: Felsberg, M., Forssén, PE., Sintorn, IM., Unger, J. (eds) Image Analysis. SCIA 2019. Lecture Notes in Computer Science(), vol 11482. Springer, Cham. https://doi.org/10.1007/978-3-030-20205-7_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-20205-7_17
Published: 12 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20204-0
Online ISBN: 978-3-030-20205-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Unsupervised Feature Extraction – A CNN-Based Approach

Abstract

Similar content being viewed by others

Information-Based Learning of Deep Architectures for Feature Extraction

Convolutional Neural Network with Multi-column Characteristics Extraction for Image Classification

Principal Feature Visualisation in Convolutional Neural Networks

1 Introduction