1 Introduction

Data mining has become a research hotspot of great concern to researchers for decades because of its significance in various application fields. Cluster analysis, as one of the most important technologies of data mining, has been developing various algorithms [2, 35, 49, 54] continuously, which is widely applied in a variety of application scenarios, such as social network analysis [9, 32, 33, 53], community detection [15, 45, 50, 57, 60], computer vision [34, 38, 46], natural language processing [24, 28] and knowledge discovery [20, 58, 59]. Clustering and classification are the most two important categories of machine learning, and their major difference is whether supervised learning or not. The objective of clustering is to pull similar data points (according to specific metric in extracted feature space) into the same clusters, while those data points with highly distinct features will be far apart.

Initially, clustering only categories the unlabeled data, which is a branch of unsupervised learning. The unsupervised clustering technique has drawn a tremendous amount of research attention, and many clustering methods have been proposed [11, 12, 19, 51, 55] in the past. These clustering methods can be generally categorized into three types: (1) Feature learning based methods. This kind of methods tries to find more discriminative features by combining with data dimension reduction technique [39, 55] or subspace learning technique [1, 11]. (2) Metric learning based methods. These methods aim to learn an appropriate distance metric for the training data. Under the learned distance metric, it can group similar samples together and separate dissimilar samples apart at the same time [19, 22, 42]. (3) Graph based clustering. This kind of methods partitions the data into different classes according to their pairwise similarities [10, 48]. Recently, deep learning technique has achieved great success in many fields due to its superiority of learning capacity, and some deep learning based methods [26, 40, 52] have been used to solve clustering problems.

Generally speaking, how to extract useful features and learn an appropriate metric for high-dimensional data without any supervised information is a challenging task. Consequently, some supervised clustering algorithms[13, 17, 56] have been proposed to improve the clustering result. However, most of these methods have great limitations in real practical applications, because it is almost impossible for all data having labels. At the same time, tagging enough sample manually is a waste of human resources and time, and it is also unrealistic. In fact, in most of the real-world applications, we can only obtain limited labeled data while most of the data are unlabeled. Based on the above problems, semi-supervised based clustering methods [4, 14, 44] have emerged more recently. These methods adjust the learning framework through limited label data, so that the clustering process can be executed in the supervised framework, which greatly improve the clustering performance and have widely applicability.

1.1 Motivation

Although the existing semi-supervised clustering algorithms have achieved good results, there are still two important issues that will hinder the performance of clustering. (i) Most of these methods extract features or learn a distance metric through traditional SVM, neural networks or linear mapping, which limits its performance. (ii) They only use the labeled data to guide the process of the clustering, so they can not make full use of the traits of data especially unlabeled data.

Inspired by these problems, we propose a semi-supervised clustering with deep metric learning (SCDML), which can extract discriminative features by using deep metric learning model. At the same time, the unlabeled data is also used to optimize clustering result through k-nearest neighbors label updating strategy to dynamically increase the labeled data set, and then it can promote the performance of the metric learning network. Figure 1a illustrates the existing semi-supervised clustering models which trains the network model with fixed input, while in our network model, the model will be constantly improved by updating the labeled data incrementally, as shown in Figure 1b.

Figure 1
figure 1

The difference between existing semi-supervised learning methods and our proposed label propagation model

In order to further improve the performance of SCDML, through extensive analysis and experimental results, we found that: (i) SCDML takes Siamese CNNs as the metric learning network, in which the contrastive loss function is used to optimize the network. The main objective of contrastive loss is to reduce the distance between positive samples and increase the distance between negative samples. However, contrastive loss treats positive sample and negative sample equally while ignores the difficulty in metric learning brought by negative sample. (ii) In the process of labeling propagation, the k-nearest neighbors of cluster center are tagged as the new labeled data in each cluster, which doesn’t fully utilize the results of deep metric learning network. In addition, the parameter k is difficult to be predefined, so it is crude to select k unlabeled data nearest to the center of the labeled data from each cluster. Therefore, we will further improve the performance of our SCDML approach based on the above two aspects.

1.2 Contributions

The key contributions of our work can be summarized as below:

  1. (1)

    In this work, we design a novel semi-supervised clustering model, which includes a semi-supervised deep metric learning subnetwork and a labeling propagation subnetwork. To the best of our knowledge, the proposed method is a pioneer to address clustering task by combining deep metric learning with semi-supervise learning techniques.

  2. (2)

    In the metric learning subnetwork, we integrate the Siamese CNNs to extract discriminative features to minimize the cluster error.

  3. (3)

    In the labeling propagation subnetwork, we design a k-nearest neighbors label updating strategy to transform the unlabeled data into labeled data. As a result, it can reinforce the ability of metric learning network.

  4. (4)

    We have conducted extensive experiments on three datasets to demonstrate the effectiveness of our proposed approach. Experimental results show that our approach is a robust competitor for the most state-of-the-art clustering methods.

Note that we presented our preliminary study of deep semi-supervised clustering in the prior work [29] as an abstract paper. In this article, we make significant revision and add substantial new materials compared with the prior work. Specifically, this article makes the following new contributions:

  1. (1)

    We provide a systematically analysis of SCDML and also a more comprehensive review of the related work.

  2. (2)

    To obtain more discriminative and robust features, we improve our model by applying the triplet loss as the metric learning network’s loss function instead of contrastive loss. The triplet CNNs takes three labeled samples (an anchor, a positive sample and a negative sample) as an input. Under the triplet loss function, the positive sample can be pulled closer to anchor point while the negative sample will be pushed away from the anchor at the same time. As a result, all the labeled data can be clustered in learned feature space.

  3. (3)

    To reinforce the reliable of new labeled data, we propose a more reasonable and effective labeling propagation network. Specifically, by combining the result of classification network and the result of our improved graph clustering algorithm, the unlabeled data can be transformed from weak labeled data into strong labeled data. The new added strong labeled data can positively forward the deep metric learning and classification network, and then improve the accuracy of metric learning network.

  4. (4)

    We have conducted extensive experiments on four datasets and compared our approaches with more competing methods. In addition, we evaluate the effectiveness of our approaches with its two variants and have provided more verification experiments.

2 Related work

2.1 Clustering methods

In this subsection, we briefly introduce the background of clustering methods, including features learning based methods, metric learning based methods and deep learning based methods.

Features based clustering

Features based clustering divides the dataset to clusters according the data’s features. The k-means [18] clustering algorithm is a classical features based unsupervised feature learning. This method aims to minimize the following objective function:

$$ J=\sum\limits_{j=1}^{k}\sum\limits_{i=1}^{n}||{x_{i}^{j}}-c_{j}||^{2} $$

where \(||{x_{i}^{j}}-c_{j}||^{2}\) indicates the Euler distance between the data point \({x_{i}^{j}}\) and the cluster center cj.

Many more efficient varieties of k-means were proposed in the last few decades. Saha et al. [39] proposed a useful model, which performs clustering according to the feature selection and the fuzzy data simultaneously. In literature [55], an adaptive hashing method based on feature clustering is proposed to reduce data dimension.

Metric learning based clustering

Metric learning can learn the distance metric function for a specific task autonomously according to different tasks. A common metric distance function is defined as follows:

$$ d_{M}(x,x^{\prime})=\sqrt{(x-x^{\prime})^{T}M(x-x^{\prime})} $$

where \(M\in \mathbb {R}^{d\times d}\) is called the metric matrix which is the inverse of covariance matrix \(\sum \). Obviously, M is a symmetric matrix.

Kalintha et al.[22] proposed a non-linear transformation of distance matric learning for clustering, which performs well on non-linear separable data. Heidari et al. [19] proposed a probabilistic model, which combines the fuzzy clustering and metric learning to maximize the distance between clusters and minimize the intra-cluster distance.

Graph based methods

As one of the most popular clustering techniques recently, graph clustering has attracted lots of researchers and various graph based clustering methods were proposed[30, 31, 46, 47]. These methods represent entities as vertices in an undirected graph with weighted edges to describe the relationships between entities. In [10], Chen et al. proposed a sparse representation method for graph clustering. In [48], Xie et al. proposed a multi-view graph clustering with global and local graph embedding.

Deep learning based methods

These methods can learn more discriminative and robust features by using convolutional neural networks(CNNs) [6, 7, 21]. In [40], Sekmen et al. combined subspace clustering with CNNs to train a deep subspace clustering model. In [27], a nonlinear embedding model is proposed to learn a new representation of examples, so that elements in the same category are organized into the same cluster. In [8], Chen et al. proposed a deep nonparametric clustering method, in which deep learning is used for feature extraction and dimension reduction. Compared with these methods, our proposed approach is semi-supervised, which can improve the performance of clustering with supervised information.

2.2 Semi-supervised learning

Semi-supervised learning is a machine learning technique to improve the performance of the trained model [4, 25, 44]. Different from unsupervised learning, semi-supervised learning train model by utilizing few labeled sample and abundant unlabeled data. Guan et al. [14] proposed a feature space learning model based on semi-supervised framework to better understand and learn feature space. In [37], Laine et al. proposed a temporal ensemble model for semi-supervised learning. In [43], a local density model is proposed to measure the similarity between k-nearest vertex. Kang et al. [23] combined multiple kernel learning with semi-supervised technique to tackle clustering problem. Compared to these traditional semi-supervised learning based clustering methods, our approach can learn more meaningful and discriminative features, which are beneficial to the following clustering.

Recently, some deep semi-supervised based clustering methods have been proposed [3, 36, 41]. In [3], Arshad proposed a semi-supervised deep fuzzy C-mean clustering (DFCM). In [36], Ren et al. proposed a semi-supervised deep embedded model for clustering. In [41], a ClusterNet model is designed by Shukla et al, which uses pair-wise semantic constraints to drive the clustering approach. However, our approach is different from these methods in two aspects. (i) We can make full use of the unlabeled data instead of only utilization for regularization. (ii) We employ label propagation strategy to tag more unlabeled data, while in these methods the number of labeled data is fixed.

3 SCDML

To extract more discriminative features for optimizing the clustering model, we apply the Siamese CNNs and take the contrastive loss as the metric learning’s loss function. We also propose the k-nearest neighbors label updating strategy to dynamically transform the unlabeled data into labeled data, which can give full play to the contribution of unlabeled data.

3.1 semi-supervised deep metric learning network

We design a semi-supervised deep metric learning network based on Siamese CNNs, as shown in Figure 2a.

Figure 2
figure 2

Illustration of the SCDML approach. The approach consists of two steps: (1) extract discriminative features by Siamese CNNs (the left); (2) obtain more labeled data by k-nearest neighbors algorithm (the right)

First feed labeled sample pairs to Siamese CNNs to extract discriminable features. In the features learning process, we take the contrastive loss as the objective function of our network. The loss function can be computed as follows:

$$ L= y||x_{1}-x_{2}||_{2}^{2}+(1-y)max\left( \alpha-||x_{1}-x_{2}||_{2}^{2},0\right) $$
(1)

where ||x1x2||2 is the Euclidean Distance between x1 and x2. x1 and x2 represent the features of input pair samples extracting by metric learning network respectively. y ∈ {0,1} (1 if the input pair is from the same class, and 0 if the input pair is from the different classes.) is the corresponding label of input pair samples.

Then encode all the data including labeled data and unlabeled data through the trained metric learning network to obtain their features.

Finally, classify the unlabeled data according to the encoded features, and record the classification results as the label of the unlabeled data.

3.2 k-nearest neighbors label updating strategy

In this subsection, we propose a k-nearest neighbors label updating strategy to transform the unlabeled data into labeled data.

As discussed above, all the data are classified in to C clusters and each cluster contains limited labeled data while a lot of unlabeled data. To make full use of the features of unlabeled data, we add kC new unlabeled data to the labeled dataset each time. The main process of k-nearest neighbors label updating strategy is as follows.

Step 1: :

Compute the center of each cluster according to the labeled data.

$$ c_{i}=\frac{1}{N_{c_{i}}^{l}}\sum\limits_{j=1}^{N_{c_{i}}}\left\{\left( {s}_{j}^{l},l_{j}\right)|l_{j}=i\right\} $$
(2)

where sjl is the labeled data, Ncil is the number of labeled samples in cluster ci, Nc is the number of cluster, lj is the label of sample sjl.

Step 2: :

Search the k nearest unlabeled data from the center of labeled data in each cluster, and then update their attributes from unlabeled data to labeled data. The new added labeled data Δ in cluster ci can be computed by:

$$ {\varDelta} S=Sort\left( \left\{Dis\left( \left( {s_{j}^{u}},l_{j}\right)|l_{j}=i,c_{i}\right)\right\}, k\right) $$
(3)

where (sju,lj)|lj = i indicates the unlabeled data sju in the ith cluster, Dis(,) is the distance function, Sort(X,k) indicates sorting the elements of X by ascending order and return the top k elements.

For example, in Figure 2b the solid points represent labeled data, and the hollow points represent unlabeled data. After finding each cluster’s center of labeled data, each cluster generate three new labeled data which are the nearest unlabeled samples in this center.

As the number of labeled data increases, our proposed metric model can learn more robust and discriminative features, which will further improve the accuracy of the clustering.

4 Improved semi-supervised clustering with deep metric learning

As discussed in the motivation of Section 1, there are still two factors that will affect the performance of clustering: (i) The selection of metric function will influence the accuracy of data feature extraction, then further affect the accuracy of clustering results; (ii) In practical applications, the k-nearest neighbors label updating strategy is not very suitable, due to the different density of each cluster, the number of labeled data and their distribution in each cluster. Moreover, the choice of parameter k also hinders the effectiveness of the algorithm. To enhance the performance of SCDML for the semi-supervised clustering, we improve our SCDML approach from the following two aspects: (i) We take the triplet CNNs as the metric learning model and employ the triplet loss function as the model’s loss to train the network. (ii) We design a more reasonable label propagation network to transform the unlabeled data into labeled data dynamically.

The framework of improved semi-supervised clustering with deep metric learning (SCDMLGE) is shown in Figure 3, which contains two subnetworks: a semi-supervised deep metric learning and classification network, and a labeling propagation network. The following subsections will present the details of our proposed approach.

Figure 3
figure 3

The framework of the clustering with deep semi-supervised metric learning. The framework consists of two subnetworks: (1) a feature extraction subnetwork by Triplet CNNs (the left); (2) a label propagation subnetwork by graph clustering algorithm (the right)

4.1 Semi-supervised deep metric learning and classification network

Unlike the metric learning network used in previous work, we applied triplet network in this work, which contain an anchor, a positive sample and a negative sample. As we discussed above, contrastive loss treats positive sample and negative sample equally while ignores the difficulty in metric learning brought by negative sample, so we introduce the triplet loss function into our CNN model, which pushes away the negative samples from the anchor and pulls the positive samples closer to the anchor.

After training by the triplet metric learning network, the distance between anchor and the positive sample will be shorten and the negative sample will be pushed away from the anchor simultaneously. Therefore, clusters can be better formed in this feature space.

The main training process of the network consists of the following three steps.

Step 1: :

Train the network with the labeled triplets. First, extract discriminable features through the triplet CNNs, then use the features to train a classifier. To train the feature extracting network and classification network at the same time, we design the loss function for semi-supervised deep metric learning and classification network as follows:

$$ \min L=L_{M}+ \lambda_{1} L_{C} +\lambda_{2} \lVert W{\rVert_{F}^{2}}, $$
(4)

where, λ1 and λ2 are a tunable positive parameter. ∥WF2 is a regular term to prevent overfitting. LM and LC are metric learning loss and classification loss, respectively. They can be computed as follows:

$$ \begin{array}{@{}rcl@{}} L_{M}&=&\frac{1}{N} \sum\limits_{i=1}^{N}\{max(||f({x_{i}^{a}})-f({x_{i}^{p}})||_{2}\\ &&-||f\left( {x_{i}^{a}}\right)-f({x_{i}^{n}})||_{2}+\alpha, 0)\} \end{array} $$
(5)

where \(f({x_{i}^{a}}),f({x_{i}^{p}})\) and \(f({x_{i}^{n}}),\) indicate the feature of anchor, positive sample and negative sample respectively. α is the minimum margin of \(||f({x_{i}^{a}})-f({x_{i}^{p}})||_{2}\) and \(||f({x_{i}^{a}})-f({x_{i}^{n}})||_{2}\).

$$ L_{C}=-\sum\limits_{f(x)}p(f(x))\log q(f(x)) $$
(6)

where p(f(x)) is the expected outputs, and q(f(x)) is the actual outputs of the classification network.

Step 2: :

Encode the labeled and unlabeled data. Assume that Sl = {(sli,lli)|i = 1,2,…,Nl} and Su = {(sui|i = 1,2,…,Nu} separately represent the init labeled data and unlabeled data, where Nl is the number of labeled samples, Nu is the number of unlabeled samples, and lli ∈{1,2,…,C}, where C is the number of classes. We use Sl′ = {sli′|i = 1,2,…,Nl} and Su′ = {sui′|i = 1,2,…,Nu} represent the outputs of the Sl and Su by CNNs.

Step 3: :

Tag the unlabeled data according to the classification network. Therefore, Su can be denoted as Su = {(sui,lui1)|i = 1,2,…,Nu}, where lui1 is the classification label of the sui.

4.2 Semi-supervised clustering labeling propagation network

Through the deep metric learning and classification network, we can obtain the label of the unlabeled data, as called weak label. To acquire the strong label of the unlabeled data, we design a semi-supervised labeling propagation network. It includes two parts: semi-supervised clustering and labeling propagation.

In the process of the semi-supervised clustering, we propose an improved graph clustering algorithm. The details of the algorithm are as follows:

Firstly, we compute the similarity matrix W according to the following equation:

$$ \begin{array}{@{}rcl@{}} w_{ij}=\left\{\begin{array}{ll} exp\frac{-||x_{i}-x_{j}||^{2}}{2\sigma^{2}}&{\{x_{i},x_{j}\} \in {S}^{\prime}_{u} }\\ 1& {\{x_{i},x_{j}\}\in {S}^{\prime}_{l} } \land \{l_{x_{i}} = l_{x_{j}}\} \\ 0& {\{x_{i},x_{j}\} \in {S}^{\prime}_{l} \land \{l_{x_{i}} \ne l_{x_{j}}\}} \end{array}\right. \end{array} $$
(7)

where σ represents the neighborhood width of the sample points, i.e., the larger the σ, the greater the similarity between the sample points.

Secondly, we use the following formula to calculate the degree matrix D:

$$ d_{i}=\sum\limits_{j=1}^{n}w_{ij}, $$
(8)

and then we can obtain the corresponding Laplacian matrix.

$$ L=D-W, $$
(9)

Next, we use the top k eigenvectors u1,u2,…,uk of L to form a new matrix U. And then, we obtain the clustering results by using k-means clustering algorithm.

At last, we mark the Su′ according to the clustering results, and record as Su′ = {(sui′,lui2)|i = 1,2,…,N2}, where lui2 is the clustering label of the sui′.

When both the classification label and clustering label of the unlabeled data Su are obtained, we can implement labeling propagation strategy. Assume that ΔS represents newly added strong label data, it can be acquired by:

$$ {\varDelta} S=\left\{s_{ui}|\left( l_{ui}^{1}=l_{ui}^{2}\right)\right\}, $$
(10)

According (10), we can update Sl and Su untill all of the unlabeled data transform into labeled data.

$$ \begin{array}{@{}rcl@{}} S_{l}&=&S_{l}+ {\varDelta} S\\ S_{u}&=&S_{u}- {\varDelta} S \qquad, \end{array} $$
(11)

Algorithm 1 summarizes the main process of our SCDMLGE approach. It trains a classifier using the labeled data through the semi-supervised deep metric learning and classification network, and then obtains the classification label of unlabeled data Lu1 (line 3\(\sim \)4). We get the clustering label of unlabeled data Lu2 by applying our improved graph clustering, then compared Lu1 with Lu2 to update the labeled dataset(line 5\(\sim \)7). This algorithm terminates until all the unlabeled data are transformed into strong label data, or the current iteration error is less than the minimum threshold ε, or the number of iterations reaches the maximum iteration value T.

figure a

5 Experiments

5.1 Datasets and compared methods

Datesets

We implement experiments on four publicly available datasets including: Mnist, CIFAR-10 [36], YaleB [11] and 20-Newsgroups [8]. The Mnist dataset consists of 70000 images of hand-written digits from 0 to 9, and widely used for character recognition. The CIFRA-10 dataset consists of 60000 images with 10 categories, and each category includes 6000 samples. The YaleB dataset has 2414 grayscale face images including 38 persons. Each person has 64 samples captured from five different angles. The 20-Newsgroups dataset is often used in text and document classification, and it contains 18846 documents labeled into 20 categories.

Compared methods

To evaluate the efficacy of the proposed approaches, we compare our approaches with some state-of-the-art related methods including:

  1. (1)

    traditional unsupervised based methods: FCH [55], SC-CNMF [11];

  2. (2)

    traditional semi-supervised (supervised) methods: FSLSC [14], SMKL [23];

  3. (3)

    deep unsupervised based methods: DCN [5], IDEC [16];

  4. (4)

    deep semi-supervised based methods: DFCM [3], SDEC [36], ClusterNet [41].

5.2 Evaluation measures and experimental settings

To evaluate the performance of our proposed methods and compared methods, we use two types of measures, namely clustering accuracy (AC), normalized mutual information (NMI). These two measures are widely used to evaluate the performance of clustering in many researches [5, 16, 42, 55].

AC can be computed as follows:

$$ AC= \frac{1}{N}\sum\limits_{i=1}^{K}\max(C_{i}|L_{i}), $$
(12)

where N is the number of samples to be clustered. K is the number of clusters. Li is the true label information, and Ci is the predicted label information by clustering algorithm.

NMI can be computed as follows:

$$ NMI(A,B)= \frac{MI(A,B)}{\sqrt{H(A)H(B)}}, $$
(13)

where A is the true cluster set, and B is the predicted cluster set. MI(A,B) is the mutual information between A and B. H(A) and H(B) denote the entropies of A and B. The range of NMI is from 0 (A is independent from B) to 1 (A is equivalent to B).

5.3 Results and analysis

5.3.1 Clustering performance evaluation

In this subsection, we conduct experiment to evaluate the clustering performance of our proposed semi-supervised clustering with deep metric learning approach named SCDML, and its improved version named SCDMLGE. Table 1 shows the clustering results on Mnist, CIFAE-10, YaleB and 20-Newsgroups datasets. According to the experimental results, we observe that: (i) our proposed SCDMLGE outperforms all of state-of-the-art methods. (ii) SCDML can achieve better performance than most of compared methods.

Table 1 Clustering performance on Mnist, CIFAR-10, YaleB and 20-Newsgroups datasets (the percentage of labeled data is 10%). The best results are shown in bold

Specifically, compared with traditional clustering methods FCH, SC-CNMF, FSLSC, SMKL, our approaches can learn more meaningful and robust features by using deep metric learning. Moreover, FCH and SC-CNMF are unsupervised methods and the label information is not used in the process of clustering, which further weaken the performance of them. Compared with deep clustering methods DCN, IDEC, DFCM and SDEC, the reasons for the performance improvement as follows: DCN and IDEC ignore the utilization of information of labeled data. The unlabeled data is only use for regularization in DFCM, which limits the performance of deep metric learning. SDEC adopts the pairwise constraints to lead the direction of clustering, which is similar to contrastive loss. In addition, we can see that ClusterNet can outperforms other methods except our SCDMLGE approach.

From the results of last two rows in Table 1 we can confirm that SCDMLGE is superior to SCDML. As SCDMLGE takes the triplet CNNs to train the deep metric network, it can extract more discriminative features than Siamese CNNs adopted in SCDML. Better yet SCDMLGE designs an improved labeling propagation network which is more reasonable to transform unlabeled data into labeled data, and then make full use of the contribution of unlabeled data to optimize of classification model.

5.3.2 Clustering performance evaluation with different percentages of labeled data

To further evaluate the clustering performance of our proposed approaches, we increase the percentage of labeled data from 0.5% to 10%. Tables 2 and 3 separately report the AC and NMI results of our proposed SCDML and SCDMLGE approaches and five semi-supervised clustering methods on four datasets. From the results of Tables 2 and 3, we can obviously see that our SCDMLGE approach performs better than all compared semi-supervised clustering methods, which indicates that SCDMLGE can learn better discriminative structure features, and simultaneously make full use of the unlabeled data.

Table 2 AC results of proposed methods and three semi-supervised clustering methods with different percentages of labeled data on Mnist, YaleB, CIFAR-10 and 20-Newsgroups datasets. The best results are shown in bold
Table 3 NMI results of proposed methods and three semi-supervised clustering methods with different percentages of labeled data on Mnist, YaleB, CIFAR-10 and 20-Newsgroups datasets. The best results are shown in bold

5.3.3 Evaluation of the Influence of parameters

This subsection focuses on evaluating the impact of important parameters in SCDMLGE(λ1,λ2), and we take Mnist dataset as a example. For parameter λ1 and λ2, we separately observe the performance variations of SCDMLGE in the change interval of [0.1,1] with the step size of 0.1 and [0.01,0.1] with the step size of 0.01. From the results in Figure 4a and b, SCDMLGE can reach a stable and good clustering performance when λ1 is in [0.4,0.7] and λ2 is in [0.02,0.05]. In addition, other datasets can observe similar results.

Figure 4
figure 4

AC results of SCDMLGE versus different values of a parameter λ1, b parameter λ2

5.4 Effectiveness of new strategies

SCDMLGE is the improved version of SCDML which mainly takes two new strategies. In order to evaluate the effectiveness of these two improvements separately, we generate two modified versions of SCDML: (1) “SCDML+t”. A variant version of SCDML by employing triplet CNNs as deep metric learning model; (2) “SCDML+p”. A variant version of SCDML by using the new labeling propagation network to dynamically increase the labeled data.

Table 4 shows the effectiveness of our proposed new strategies. From the experimental results, we can see that the clustering performance of SCDML+t and SCDML+p are better than SCDML, which means that our proposed new strategies in SCDMLGE are beneficial to improve the performance of clustering.

Table 4 Clustering performance of our proposed improving strategies on Mnist, YaleB and 20 Newsgroups datasets (the percentage of labeled data is 10%)

6 Conclusion

In this paper, we propose a novel semi-supervised clustering with deep metric learning approach(SCDML) to address the problem of extracting more discriminative features with deep metric learning network and making full use of the unlabeled data features. In order to further improve the effectiveness and practicability of SCDML, we propose an improved semi-supervised clustering related to SCDML, named SCDMLGE, which embeds triplet CNNs in deep metric learning network instead of siamese network and comprises a new labeling propagation network simultaneously. The semi-supervised deep metric learning network adopted triplet loss function can extract more powerful features, and then learn a more discriminative metric. After that, labeling propagation network is used to label new data which is more suitable for real applications. Experimental results on Mnist, CIFAE-10, YaleB and 20-Newsgroups datasets have shown the high performance and effectiveness of our proposed semi-supervised clustering with deep metric learning approaches, and the SCDMLGE performs better than SCDML.

In our proposed approach, labeled data must cover all class, which should hinder its application value. For the future work, we will further enhance the performance of our proposed method, and apply it to solve incremental clustering problem.