Augmented Data as an Auxiliary Plug-In Toward Categorization of Crowdsourced Heritage Data

Veerappa Kudari, Shashidhar; Gunari, Akshaykumar; Jamadandi, Adarsh; Tabib, Ramesh Ashok; Mudenagudi, Uma

doi:10.1007/978-981-19-4136-8_4

Shashidhar Veerappa Kudari⁴¹,
Akshaykumar Gunari⁴¹,
Adarsh Jamadandi⁴¹,
Ramesh Ashok Tabib⁴¹ &
…
Uma Mudenagudi⁴¹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 924))

107 Accesses

Abstract

In this paper, we propose a strategy to mitigate the problem of inefficient clustering performance by introducing data augmentation as an auxiliary plug-in. Classical clustering techniques such as K-means, Gaussian mixture model, and spectral clustering are central to many data-driven applications. However, recently unsupervised simultaneous feature learning and clustering using neural networks also known as Deep Embedded Clustering (DEC) has gained prominence. Pioneering works on deep feature clustering focus on defining relevant clustering loss function and choosing the right neural network for extracting features. A central problem in all these cases is data sparsity accompanied by high intra-class and low inter-class variance, which subsequently leads to poor clustering performance and erroneous candidate assignments. Toward this, we employ data augmentation techniques to improve the density of the clusters, thus improving the overall performance. We train a variant of Convolutional Autoencoder (CAE) with augmented data to construct the initial feature space as a novel model for deep clustering. We demonstrate the results of proposed strategy on crowdsourced Indian Heritage dataset. Extensive experiments show consistent improvements over existing works.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Categorization and Selection of Crowdsourced Images Towards 3D Reconstruction of Heritage Sites

RECAL: Reuse of Established CNN Classifier Apropos Unsupervised Learning Paradigm

Deep convolutional self-paced clustering

Article 29 July 2021

Keywords

1 Introduction

In this paper, we propose a novel training mechanism to mitigate problems such as data sparsity, high inter-class variance, and low intra-class variance which leads to poor clustering performance. Traditional clustering algorithms such as K-means, Gaussian mixture models (GMM) [3], and spectral clustering [8] rely largely on the notion of distance; for example, K-means [11] uses Euclidean distance to assign data points to clusters. Recent advances in deep learning have led to emergence of clustering techniques parameterized by deep neural networks [2, 13, 14, 17, 18] attempting to jointly learn representations, and perform clustering relying on tools like Stochastic Gradient Descent and backpropagation with a clustering objective function. This introduces challenges in choosing an appropriate neural network architecture, and a right clustering objective function. Recent methods [5, 16], attempt to circumvent these problems, limited literature show investigations on the effect of data sparsity and high intra-class variance, usually found in crowdsourced cultural heritage datasets. The apparent architectural differences arise due to data acquisition methods and cultural similarities might lead to assignment of false clusters. In this paper, we empirically demonstrate the use of different transformations such as random—scaling, rotation, and shearing as data augmentation techniques toward increasing the data density, yielding superior clustering performance.

Crowd-sourcing facilitates desired data at scale and involves task owners relying on a large batch of supposedly anonymous human resources with varying expertise contributing a diversified amount of data. In our case, we are interested in obtaining a large image corpus of Indian Heritage Sites with the hindsight of large scale 3D reconstruction toward digital archival and preservation. An essential step in this pipeline is to formulate an efficient deep clustering method toward mitigate the issues outlined above. Toward this-

We propose a novel training strategy to circumvent the problem of poor clustering performance by
- introducing data augmentation as an auxiliary plug-in for deep embedded clustering
  - to densify data and facilitate better feature representation considering limited data.
  - to address data with high intra-class and low inter-class variance.
  - to augment data using affine transforms (rotation, scaling and shearing).
- incorporating Consistency Constraint Loss (CCL) with Mean Squared Error (MSE) Loss to handle introduced transformations.
We demonstrate our proposed strategy on a crowdsourced Indian heritage dataset and show consistent improvements over existing works.

In Sect. 2, we discuss contemporary works related to clustering. In Sect. 3, we propose a strategy to circumvent the problem of poor clustering performance. In Sect. 4, we discuss the experimental setup carried out on Indian Heritage Dataset. In Sect. 5, we demonstrate results through quantitative and qualitative metrics, and conclude in Sect. 6.

2 Related Works

In this section, we discuss contemporary works addressing clustering using deep features. Classical clustering techniques such as K-means [11], Gaussian Mixture Models (GMM) [3], and Spectral clustering [8] are limited by their distance metrics and perform poorly when the dimensionality is high. Toward this, recent techniques such as Deep Embedded Clustering (DEC) [16], Improved Deep Embedded Clustering [5] extract deep features toward categorization in lower dimension embedding space.

Recent advances in deep neural networks have ushered in a strategy of parameterizing clustering algorithms with neural networks. Deep Embedded Clustering (DEC), proposed by authors in [16], pioneered the idea of using deep neural networks to learn representations and solve for cluster assignment jointly. The method involves using Stochastic Gradient Descent coupled with backpropagation to extract deep features while simultaneously learning the underlying representations. However, as authors in [5] point, the choice of clustering loss tends to distort the feature space, which consequently affects the overall clustering performance. To mitigate this, the authors propose an under-complete autoencoder to preserve the data structure, leading to improved clustering performance. Inspired by these works, we propose a method to improve Clustering performance by densifying the data distribution. We hypothesize that data distribution sparsity is a significant deterrent in clustering. The problem is further exacerbated when the data exhibits high intra-class variance. We empirically show that using data augmentation as an auxiliary plug-in helps in improving cluster performance. Extensive experiments on cultural heritage dataset show consistent improvements over existing methods.

3 Categorization of Crowdsourced Heritage Data

The crowdsourced heritage data arrives in the incremental fashion, where the number of classes and number of images belonging to class are obscure. More likely, we observe that images belonging to a particular class may arrive in large number while very few samples may arrive for some other classes. This brings the problems of class imbalance and data sparsity. Due to data sparsity, deep learning techniques used for the feature representation of the images fail in their task, making the clustering performance poor. Deep learning architectures like Convolutional Autoencoders (CAE) are sensitive to these problems. Toward this, we attempt to mitigate the data sparsity issue via data augmentation (Fig. 1).

A process flow diagram to segregate crowdsourced data from the C A E model with an encoder and decoder. The cluster of the crowdsourced data is obtained in latent space. — **Fig. 1**

We increase the density of the data, by performing three kinds of data transformations, i.e., random rotation, random shear, and random scaling on the original data, as this would generally make the model more robust in terms of learning. These techniques tend to provide more generic and genuine data. There are many other augmentation techniques as described in [1, 10, 12, 15] used to increase the data density and class imbalance [6, 19].

Convolutional Autoencoder (CAE) has proven to be effective in case of classification, clustering, and object detection. We combine the augmented data with original data to train CAE to generate embeddings toward clustering.

Considering $x_i$, $i \in \{1,...,m\}$ as an image, the transformation $t_j$, $j \in \{1,.., s\}$ applied on $x_i$ generates transformed image $x_i^{t_s}$, represented as $x_i^{t} = T(x_i)$. The total number of images after augmentation are N, where $N = s\times m$.

The traditional objective function used for training the CAE is Mean Squared Error(MSE) between the input $x_i$ and decoder output $x_i^{'}$ which is as

$$\begin{aligned} MSE(x, x^`) = \frac{\sum _{i=1}^{N}{(x_i-x_i^`)^2}}{N} \end{aligned}$$

(1)

The MSE loss for training CAE limits information about the relation between original data and the augmented data. To overcome this, we incorporate the Consistency Constraints [9]. The Consistency Constraints are seen to be effective in the Semi Supervised Learning (SSL). A Consistency Constraint Loss (CCL) can be incorporated by enforcing the predictions of a data sample and its transformed counterpart (which can be obtained by randomly rotating, shearing, or scaling the images) to be minimal. The CCL Loss is defined as follows:

$$\begin{aligned} L_{CCL} = \frac{1}{NK}\sum _{i=1}^{N}\sum _{k=1}^{K}\parallel p(k|i) - p^{t}(k|i)\parallel \end{aligned}$$

(2)

where N represents the total number of data points and the K represents total number of clusters, p(k|i) represents the probability of assignment of each image $x_i$, and $p^{t}(k|i)$ represents the probability of assignment of randomly transformed image $x_i^{t}$ to cluster k. p(k|i) is parameterized by assuming they follow the Student’s T distribution as follows:

$$\begin{aligned} p(k|i) \propto (1+\frac{\parallel z_i + \mu _k \parallel ^2}{\alpha })^{-\frac{\alpha +1}{2}} \end{aligned}$$

(3)

Here $z_i$ is the feature representation of the image $x_i$, $\mu _k$ represents the cluster center of cluster k. If U represents the cluster centers then $U = \{ \mu _k,k = 1....K\}$ which are initialized by K-means and tuned as the training progress.

The overall objective function of CAE is now defined as

$$\begin{aligned} Loss = MSE(x, x') + L_{CCL} \end{aligned}$$

(4)

We use K-means technique to quantify our results and depict how augmentation can improve the performance of the clustering. We show how data augmentation can improve the performance of the existing state of art methods Deep Embedded Clustering (DEC) and Improved Deep Embedded Clustering (IDEC), where CAE is used as the initial feature extractor. We provide the extensive ablation study of these methods over the combinations of different CAEs trained.

4 Experiments

4.1 Dataset

We extensively experiment on crowdsourced Indian Digital Heritage (IDH) Dataset. The dataset is collected through a platform sourced by crowd. We consider 10 classes of this dataset with 150 images per class toward experimentation as these 10 classes consists of high intra-class and low inter-class variance. The considered dataset undergoes augmentation like random rotation, random shearing, and random scaling. Random rotation of images is performed over the range of 0–90$^{\circ }$, random shear is performed over 50$^{\circ }$ of transformation intensity and random scaling is performed over the scale of 0.5–1.0. We generate around 6000 images through these transformations. The same dataset is used throughout the experimentation to maintain the uniformity in comparison of results in different experiments in different environments.

4.2 Training Setup

Runtime Environment: Nvidia GP107CL Quadro P620
Architecture: Autoencoder
- Encoder:
  - Contains 4 VGG Blocks
  - VGG Block has 2 Convolution Layers followed by a maxpooling layer
  - Batch normalization layer was used at the end of each layer before the activation function
  - Activation Function: ReLU
- Decoder:
  - Decoder part of the model consists of convolution transpose layers with batch normalization layer at the end of each layer before the activation function
  - Activation Function: ReLU, Sigmoid (Output Layer)
Batch Size: 16
Learning Rate: 0.001
Number of Epochs: 500 (CAE), 2000 (DEC and IDEC)
Optimizer: Adam

4.3 Evaluation Metrics

Toward evaluation of proposed strategy and comparison with state-of-the-art methods, we use Unsupervised Clustering Accuracy (ACC), Normalized Mutual Information (NMI), Adjusted Rand Index (ARI).

4.3.1 Unsupervised Clustering Accuracy (ACC):

It uses a mapping function m to find the best mapping between the cluster assignment output c of the algorithm with the ground truth y which can be defined as

$$\begin{aligned} ACC = max_m \frac{\sum _{i=1}^{N} 1\{y_i = m(c_i)\}}{N} \end{aligned}$$

(5)

For the given image $x_i$, let $c_i$ be resolved cluster label and $y_i$ be the ground truth label, m is the delta function [7] that equals one if $x = y$ and zero otherwise. m maps each cluster label $c_i$ to the equivalent label from the datasets. The best mapping can be found by using the Kuhn-Munkres algorithm [4].

4.3.2 Normalized Mutual Information (NMI)

It measures the mutual information I(y, c) between the cluster assignmentsc and the ground truth labels y and is normalized by the average entropy of both ground labels H(y) and the cluster assignments H(c), and can be defined as

$$\begin{aligned} NMI = \frac{I(y,c)}{\frac{1}{2}[H(y) + H(c)]} \end{aligned}$$

(6)

4.3.3 Adjusted Rand Index (ARI)

It computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings. It is defined as

$$\begin{aligned} ARI = \frac{Index - ExpectedIndex}{MaxIndex - ExpectedIndex} \end{aligned}$$

(7)

5 Results and Discussions

In this section, we discuss the results of the proposed strategy toward categorization of crowdsourced Indian Heritage (IDH) dataset and compare the results with state-of-the-art methods.

We measure the clustering performance by reporting the unsupervised clustering accuracy, NMI and ARI. From Table 1 we observe, CAE trained with data augmentation yields better performance over CAE trained without augmentation. We see an improvement of 2.74% when trained with MSE loss and an improvement of 15.21% when trained with a combination of MSE and CCL, as CCL loss mainly depends on augmented data. This improvement is significant in the context of clustering data with high intra-class variance.

Table 1 Performance of CAE trained with and without augmented data. CAE-WAug represents CAE model trained without augmented data and CAE-Aug represents CAE model trained with augmented data

Full size table

To discern the effect of individual augmentation techniques (rotate, scale and shear), we choose samples in the combination of—{ori, rot}, {ori, sher} and {ori, scal}. The results are presented in Table 2. We observe, set {ori, rot} shows poor performance compared to augmentations consisting of {ori, sher} and {ori, scal}. We hypothesize that the performance drop for {ori, sher} can be attributed to the fact that, the CAE is not equipped with appropriate symmetry inductive bias that enables it to learn rotation-invariant features.

Table 2 Effect of augmentation on performance of the original data. Original, rotated, sheared, scaled data are represented as ori, rot, sher and scal, respectively

Full size table

5.1 Ablation Study

In this section, we perform ablation study using DEC [16] and IDEC [5] with and without considering augmentation. In Table 3, we provide the ablation study of DEC which is unsupervised clustering technique that jointly optimizes the cluster centers and the parameters of the CAE. KL-divergence between the auxiliary and target distribution optimizes the objective function. From Table 3, we infer that providing CAE with the augmented data followed by DEC considering original data increases the accuracy by 5.61%. While providing augmented data to DEC, with CAE being trained with original data hinders the performance. Hence, only CAE is trained with original and augmented data ensuring the objective is met.

Table 3 Comparing results of proposed methodology with DEC [16]. CAE-WAug and CAE-Aug refers to CAE trained without and with augmentation, respectively. DECWAug and DEC-Aug refers to DEC trained without and with augmentation respectively. We show how combination of augmentation applied to CAE and DEC may affect the clustering performance

Full size table

Table 4 Comparing results of the proposed methodology with IDEC [5]. CAE-WAug and CAE-Aug refers to CAE trained without and with augmentation respectively. IDECWAug and DEC-Aug refer to IDEC trained without and with augmentation, respectively. We show how the combination of augmentation applied to CAE and IDEC may affect the clustering performance

Full size table

In Table 4 we provide the ablation study of the Improved Deep Embedded Clustering (IDEC). IDEC is an improvement over IDEC, which not only jointly optimize the cluster centers and parameters of the CAE, but also preserve the local structure information. They use KL-divergence between the auxiliary and target distribution as their objective function along with the MSE loss of the CAE. From Table 4 it can be observed that providing CAE with augmented data with MSE+CCL loss, then providing the trained CAE to IDEC, where IDEC is trained on original data improves the performance by depicting the increase in accuracy by 7.43%. While training the IDEC with augmented data with CAE trained on original data only hinders the performance.

From the experiments we observe, it is better to train the CAE with MSE+CCL as the integrity loss, with augmented data. The CAE trained in such an environment is incorporated for initial feature representation to the IDEC by providing original data, to perform better than other methods.

6 Conclusions

In this paper, we have defined data augmentation as an auxiliary plug-in for deep embedded clustering that densifies data helping in accurate clustering performance. We have demonstrated how data augmentation helps to increase the data density yielding superior clustering performance when the data is considerably less in amount. Extensive experimentation is done on setting up the right objective function. Our main objective is to cluster data with very less inter-class variance and very high intra-class variance. We have demonstrated our experiments on crowdsourced heritage dataset. We also show, how certain augmentation techniques uphold the clustering objectives (such as random shear and random scale), while some of them hinders the same (random rotation). We demonstrate our results on Indian Digital Heritage (IDH) dataset to show our methodology shows better performance to state-of-the-art clustering algorithms. Deploying clustering algorithms for critical applications warrants circumspection and is still a work in progress and we believe our work is a step in this direction.

References

Bloice M, Stocker C, Holzinger A (Aug 2017) Augmentor: an image augmentation library for machine learning. J Open Sour Softw 2 . https://doi.org/10.21105/joss.00432
Caron M, Bojanowski P, Joulin A, Douze M (2019) Deep clustering for unsupervised learning of visual features
Google Scholar
Chen J, Zhang L, Liang YC (May 2019) Exploiting Gaussian mixture model clustering for full-duplex transceiver design. IEEE Trans Commun 1. https://doi.org/10.1109/TCOMM.2019.2915225
Cui H, Zhang J, Cui C, Chen Q (Jan 2016) Solving large-scale assignment problems by Kuhn-Munkres algorithm. https://doi.org/10.2991/ameii-16.2016.160
Guo X, Gao L, Liu X, Yin J (2017) Improved deep embedded clustering with local structure preservation. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17, pp 1753–1759. https://doi.org/10.24963/ijcai.2017/243
Guo X, Zhu E, Liu X, Yin J (2018) Deep embedded clustering with data augmentation. In: Zhu J, Takeuchi I (eds) Proceedings of The 10th Asian conference on machine learning. Proceedings of machine learning research, 14–16 Nov 2018, vol 95, pp 550–565. PMLR. http://proceedings.mlr.press/v95/guo18b.html
Gupta SC (1964) Delta function. IEEE Trans. on Educ. 7(1):16–22. https://doi.org/10.1109/TE.1964.4321835,
Article Google Scholar
Hamad D, Biela P (2008) Introduction to spectral clustering. In: 2008 3rd international conference on information and communication technologies: from theory to applications, pp 1–6. https://doi.org/10.1109/ICTTA.2008.4529994
Han K, Vedaldi A, Zisserman A (2019) Learning to discover novel visual categories via deep transfer clustering
Google Scholar
Inoue H (2018) Data augmentation by pairing samples for images classification. https://openreview.net/forum?id=SJn0sLgRb
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892. https://doi.org/10.1109/TPAMI.2002.1017616
Article MATH Google Scholar
Mikołajczyk A, Grochowski, M (2018) Data augmentation for improving deep learning in image classification problem. In: 2018 international interdisciplinary PhD workshop (IIPhDW), pp 117–122. https://doi.org/10.1109/IIPHDW.2018.8388338
Mrabah N, Khan NM, Ksantini R, Lachiri Z (2020) Deep clustering with a dynamic autoencoder: from reconstruction towards centroids construction
Google Scholar
Ren Y, Wang N, Li M, Xu Z (2018) Deep density-based image clustering. CoRR abs/1812.04287
Google Scholar
Shorten C, Khoshgoftaar T (2019) A survey on image data augmentation for deep learning. J Big Data 6:1–48
Article Google Scholar
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis
Google Scholar
Yang J, Parikh D, Batra D (2016) Joint unsupervised learning of deep representations and image clusters
Google Scholar
Zhan X, Xie J, Liu Z, Ong YS, Loy CC (Jun 2020) Online deep clustering for unsupervised representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Google Scholar
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2017) Random erasing data augmentation
Google Scholar

Download references

Acknowledgements

This project is partly carried out under the Department of Science and Technology (DST) through ICPS program—Indian Heritage in Digital Space for the project “CrowdSourcing” (DST/ ICPS/ IHDS/ 2018 (General)) and “Digital Poompuhar" (DST/ ICPS/ Digital Poompuhar/ 2017 (General)).

Author information

Authors and Affiliations

KLE Technological University, Hubballi, India
Shashidhar Veerappa Kudari, Akshaykumar Gunari, Adarsh Jamadandi, Ramesh Ashok Tabib & Uma Mudenagudi

Authors

Shashidhar Veerappa Kudari
View author publications
You can also search for this author in PubMed Google Scholar
Akshaykumar Gunari
View author publications
You can also search for this author in PubMed Google Scholar
Adarsh Jamadandi
View author publications
You can also search for this author in PubMed Google Scholar
Ramesh Ashok Tabib
View author publications
You can also search for this author in PubMed Google Scholar
Uma Mudenagudi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shashidhar Veerappa Kudari .

Editor information

Editors and Affiliations

KLE Technological University, Hubli, India
Uma Mudenagudi
Indian Institute of Technology Mandi, Mandi, India
Aditya Nigam
International Institute of Information Technology Hyderabad, Hyderabad, India
Ravi Kiran Sarvadevabhatla
Jawaharlal Nehru University, New Delhi, India
Ayesha Choudhary

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veerappa Kudari, S., Gunari, A., Jamadandi, A., Tabib, R.A., Mudenagudi, U. (2022). Augmented Data as an Auxiliary Plug-In Toward Categorization of Crowdsourced Heritage Data. In: Mudenagudi, U., Nigam, A., Sarvadevabhatla, R.K., Choudhary, A. (eds) Proceedings of the Satellite Workshops of ICVGIP 2021. Lecture Notes in Electrical Engineering, vol 924. Springer, Singapore. https://doi.org/10.1007/978-981-19-4136-8_4

Download citation

DOI: https://doi.org/10.1007/978-981-19-4136-8_4
Published: 27 November 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4135-1
Online ISBN: 978-981-19-4136-8
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics

Augmented Data as an Auxiliary Plug-In Toward Categorization of Crowdsourced Heritage Data

Abstract

Similar content being viewed by others

Categorization and Selection of Crowdsourced Images Towards 3D Reconstruction of Heritage Sites

RECAL: Reuse of Established CNN Classifier Apropos Unsupervised Learning Paradigm

Deep convolutional self-paced clustering

Keywords

1 Introduction

2 Related Works

3 Categorization of Crowdsourced Heritage Data

4 Experiments

4.1 Dataset

4.2 Training Setup

4.3 Evaluation Metrics

4.3.1 Unsupervised Clustering Accuracy (ACC):

4.3.2 Normalized Mutual Information (NMI)

4.3.3 Adjusted Rand Index (ARI)

5 Results and Discussions

5.1 Ablation Study

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Augmented Data as an Auxiliary Plug-In Toward Categorization of Crowdsourced Heritage Data

Abstract

Similar content being viewed by others

Categorization and Selection of Crowdsourced Images Towards 3D Reconstruction of Heritage Sites

RECAL: Reuse of Established CNN Classifier Apropos Unsupervised Learning Paradigm

Deep convolutional self-paced clustering

Keywords

1 Introduction

2 Related Works

3 Categorization of Crowdsourced Heritage Data

4 Experiments

4.1 Dataset

4.2 Training Setup

4.3 Evaluation Metrics

4.3.1 Unsupervised Clustering Accuracy (ACC):

4.3.2 Normalized Mutual Information (NMI)

4.3.3 Adjusted Rand Index (ARI)

5 Results and Discussions

5.1 Ablation Study

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation