Keywords

1 Introduction

Previously in supervised learning, each example is associated with only one instance and single label is assigned to this instance. As single label is assigned to the instance, this classification is known to be single-label classification. In some situations, where real-world object cannot be handled by single label, this problem can be solved by using multilabel classification. The purpose of multilabel classification is to discover the set of labels for unseen instances. In multilabel classification, input data instances are related to multiple class labels. For example, particular news article can be associated with economical article, political article, and business article.

Multilabel classification is used in several areas such as bioinformatics, text categorization, tag recommendation, image classification, direct marketing, medical diagnosis, query categorization, and protein function prediction [3, 4].

2 Related Work

Multilabel classification techniques are grouped into two categories that are problem transformation and algorithm adaptation method [5].

Problem transformation approach transforms the multilabel problem into a set of single-label problem. Problem transformation approach is algorithm independent. Several problem transformation methods such as binary relevance (BR) [6], label powerset (LP) [5], and classifier chain (CC) [7] are used in multilabel classification.

The algorithm adaptation method extends the available machine learning algorithms so as to handle multilabel data. The several algorithm adaptation approaches have been proposed that are multilabel k-nearest neighbor (ML-kNN) [8], binary relevance k-nearest neighbor (BR-kNN) [9], and IBLR [10].

ML-kNN and BR-kNN do not consider label correlation whereas IBLR considers label correlation but not good in terms of hamming loss.

Liu and Cao [1] proposed coupled k-nearest neighbor algorithm for multilabel classification (CK-STC). CK-STC is based on coupled label similarity [11] which updates ML-kNN algorithm which can handle label correlation. An advantage of CK-STC is that it considers label correlations and provides better performance than that of ML-kNN, IBLR, BSVM, and BR-kNN but it is more complex.

Same as traditional classification, multilabel classification also suffers from curse of dimensionality which can be handled by feature selection. Feature selection selects the relevant or more efficient features from original set of features. Traditional feature selection methods handle multilabel data by transforming multilabel problem into single-label problem and then apply feature selection.

Lee et al. [12] proposed multivariate mutual information-based feature selection. Due to transformation of multilabel problem into single-label problem, there is damage to original label structures and which reduces the performance of classifier. So new method called information gain feature selection for multilabel data [2] can perform feature selection on multilabel data directly.

Multilabel classification tools MEKA [13] and MULAN [14] are available for handling multilabel data. MEKA is a multilabel extension to WEKA data mining tool. MULAN provides framework for implementations of many multilabel learning algorithms. In MULAN feature selection methods, multilabel data is transformed into single-label data and then traditional single-label dimensionality reduction technique is applied on these single-label data.

3 Implementation Details

3.1 FSVIG Multilabel Feature Selection Algorithm Details

FSVIG multilabel feature selection algorithm is implemented using information gain measure. Feature selection method removes the redundant and irrelevant features from database. Feature selection technique used in this work is reported in [2]. In this feature, selection filter approach is used. Information gain is used as information metric for measuring correlation degree between features and labels that are present in database. Information gain (IG) between features and entire label set is used to quantify the importance of features. IG is calculated with the help of label entropy H(L) and feature entropy H(fi). A bigger value of IG represents greater correlation between feature and labels. But in some situation, information gain of each label set may not be in same range of measurement. So for comparison, the normalized processing was performed on information gain (SU). Figure 1 shows pseudo-algorithm for feature selection FSVIG which is as follows.

Fig. 1
figure 1

Pseudo-algorithm for feature selection FSVIG

3.2 CK-STC Multilabel Classification Algorithm Details

CK-STC algorithm for multilabel classification is reported in [1, 15]. In CK-STC algorithm, coupled label similarity is estimated using intra-coupling similarity between label value \( w_{i}^{x} \;{\text{and}}\;w_{j}^{y} \) and inter-coupling label similarity between label value \( w_{i}^{x} \;{\text{and}}\;w_{j}^{y} \;{\text{w}}.{\text{r}}.{\text{t}} \) feature value \( w_{p}^{z} \). Inter-coupling label similarity is calculated with the help of co-occurrence frequency (CF) and intra-coupling similarity is calculated with the help of occurrence frequency (F). For every instance in training data, k-nearest neighbor is calculated. Coupled label similarity is considered while calculating prior probabilities and frequency arrays. Estimation of k-nearest neighbor for unseen instance is done. For each label, statistical information is calculated with the help of k-nearest neighbors of unseen instances. Unseen instance labels are evaluated via MAP (Maximum a posterior) rule. MAP rule is based on Bayes theorem.

Figure 2 shows pseudo-algorithm for multilabel classification CK-STC which is stated as follows:

Fig. 2
figure 2

Pseudo-algorithm for multilabel classification CK-STC

4 Experimental Setup

Table 1 shows benchmark multilabel datasets information including number of features, total number of labels, and number of instances. Experiments were performed on Genbase, Medical, and Enron dataset.

Table 1 Dataset Description

Experiments were performed on i5 processor and on Windows 7 operating system. Implementation of multilabel classification algorithm CK-STC and multilabel feature selection algorithm FSVIG was done in MS VISUAL STUDIO.NET and development tool is visual studio 10. In experiments, a proposed (CK-STC with feature selection) method was compared with existing CML-kNN method. Euclidean distance was used to calculate k-nearest neighbors. The experiments were performed on k = 5, 7, 9 and then the average is calculated. Tenfold cross-validation is performed on the above dataset.

5 Experimental Results

Performance of multilabel classification is measured in terms of hamming loss, one-error, and average precision. For each evaluation metric, “\( \downarrow \)” indicates “smaller value has better results” and “\( \uparrow \)” indicates “bigger value has better results”. Bold value indicates winner of the classifier. Experiments were carried out in four sets as follows:

  1. (1)

    Multilabel classification using CK-STC without feature selection

    Table 2 shows results of CK-STC algorithm.

    Table 2 Results of CK-STC
  2. (2)

    Multilabel feature selection using FSVIG algorithm

    Table 3 indicates results of feature selection FSVIG.

    Table 3 Results of feature selection as per FSVIG
  3. (3)

    Multilabel classification using MULAN feature selection technique binary relevance attribute evaluator and its comparison with CK-STC with feature selection FSVIG

The results of multilabel feature selection algorithm FSVIG and MULAN attribute selection algorithms such as binary relevance attribute evaluator algorithm represented in Tables 4, 5 and 6.

Table 4 Comparisons of CK-STC, feature selection FSVIG with MULAN binary relevance attribute evaluator w.r.t. hamming loss \( \downarrow \)
Table 5 Comparisons of CK-STC, feature selection FSVIG with MULAN binary relevance attribute evaluator w.r.t. one-error \( \downarrow \)
Table 6 Comparisons of CK-STC, feature selection FSVIG with MULAN binary relevance attribute evaluator w.r.t. average precision \( \varvec{ } \uparrow \)

Table 4 indicates that CC algorithm with binary relevance attribute evaluator performs better than other algorithms w.r.t. hamming loss. Table 5 indicates that CK-STC algorithm with FSVIG feature selection technique performs better than other algorithms w.r.t. one-error. Table 6 shows that CK-STC algorithm with FSVIG feature selection technique performs better than other algorithms w.r.t average precision.

  1. (4)

    Multilabel classification using MULAN feature selection technique multiclass attribute evaluator and its comparison with CK-STC with feature selection FSVIG

The results of multilabel feature selection algorithm FSVIG and MULAN attribute selection algorithms such as multiclass attribute evaluator algorithm represented in Tables 7, 8, and 9.

Table 7 Comparisons of CK-STC, feature selection FSVIG with MULAN binary relevance attribute evaluator w.r.t. hamming loss \( \varvec{ } \downarrow \)
Table 8 Comparisons of CK-STC, feature selection FSVIG with MULAN multiclass attribute evaluator w.r.t one-error \( \varvec{ } \downarrow \)
Table 9 Comparisons of CK-STC, feature selection FSVIG with MULAN multiclass attribute evaluator w.r.t average precision \( \varvec{ } \uparrow \)

Table 7 indicates that CK-STC algorithm with FSVIG feature selection technique performs better than other algorithms w.r.t. hamming loss. Table 8 indicates that CK-STC algorithm with FSVIG feature selection technique performs better than other algorithms w.r.t. one-error. Table 9 indicates that CK-STC algorithm with FSVIG feature selection technique performs better than other algorithms w.r.t average precision.

Tables 6, 7, 8, and 9 indicate that feature selection technique FSVIG with CK-STC gives better performance in terms of one-error and average precision than feature selection algorithms reported in MULAN. Multilabel classification with FSVIG feature selection does not provide best performance in terms of hamming loss because of weak connection between labels in the datasets. Feature selection algorithms in MULAN library transform multilabel problem into single-label problem and on single-label dataset, feature selection methods are applied; so due to transformation there is damage to original label structures which reduces the performance of classifier.

For evaluating the statistical significance of the results, the Friedman test for paired data is used. The level of significance of the Friedman test was determined at α = 0.05.

6 Conclusions

A lazy learning approach CK-STC has been reported in the literature for multilabel classification using coupled similarity between labels that improves the prediction performance, but irrelevant features present in database affect the prediction accuracy of classifier.

A multilabel feature selection FSVIG is incorporated in CK-STC algorithm. FSVIG uses information gain that shows better performance than existing multilabel feature selection algorithms when used with ML-NB, ML-kNN, and RandSvm classifier. This paper investigates a performance of FSVIG when used with CK-STC and compares its performance with other multilabel feature selection algorithms available in MULAN using standard multilabel datasets.

Experimental results show that FSVIG when used with CK-STC provides better performance in terms of average precision and one-error.