Keywords

1 Introduction

The one-class approach in machine learning has been receiving more attention particularly for solving problems where the negative class is not well defined [1,2,3,4,5]; moreover, the one class approach has been successfully applied in various fields including text mining [6], functional Magnetic Resonance Imaging (fMRI) [7], signature verification [8] and miRNA gene and target discovery [5, 9].

One and two class classification methods can both give useful classification accuracies. The advantage of one-class methods is that they do not require any additional effort for choosing the best way of generating the negative class. Some data is composed from multiple category or classes. For such a data, special methods are required and the one class gives insufficient results.

Here, we present a new approach that devise the positive class into sub-groups and then builds a classifier (one-class) for each sub-group.

The key point of our classification method is to run the k-means clustering algorithm over the positive cases and then building up a classifier for every cluster separately. For a given new sample the algorithm applies all the generated classifiers. If one of those classifiers classifies the given sample as a positive sample then it will be considered as a positive sample, otherwise it is a negative sample.

The problem of how to implement a multi-class classifier by an ensemble of one-class classifiers has attracted much attention in the machine-learning community [10, 11]. Tax et al. [12] proposed a method for combining different one-class classifiers, which improves dramatically the performance and the robustness of the classification, to the handwritten digit recognition problem. Similar method investigated by Lai et al. [13] to the problem of image retrieval problem. They reported that combining multi SVM based classifiers improves the retrieval precision.

Several methods for handling missing feature values which are based on combining one-class classifiers have been suggested by Juszczak et al. [14]. They indicated that their methods are more flexible and more robust to small sample size problems than the standard methods such as the linear programming dissimilarity-data description. Ban et al. [15] suggested using multiple one-class classifiers which can deal with the feature space and the nonlinear classification problem. They trained the multi one-class classifiers on each class and then extracted a decision function which based on minimum distance rules. Their experiments showed that the proposed method outperforms the OC-SVM.

Lyu and Fraid [16] provided a multi one-class SVMs which combines a beforehand clustering process for detecting hidden (Steganographic) messages in digital images. They show how a multi one-class SVM greatly simplifies the training stage of the classifiers. Their results show that even though the overall detection improves with an increasing number of hyper-spheres (i.e., the clusters), the false-positive rate begins to increase considerably with the increases of the number of the hyper-spheres. Recently, Menahem et al. [17] described a different multiple one-class classification approach called TUPSO which combines multi one-class classifiers via meta-classifier. They showed that TUPSO outperforms existing methods such as the OC-SVM.

Much works on multi one-class classification are existed concerning the computational biology community. Spinosa et al. [18] suggested a multi one-class classification technique to detect novelty in gene expression data. They combined different one-class classifiers such as OC-Kmeans and OC-KNN. For a given example, each classifier votes ‘positive’ or ‘outlier’. The final decision is taken according to the majority votes of all classifiers. Results showed that the robustness of the classification is increased because it takes into account different classifiers such that each one judges the examples in different point of view. Recently, similar approach provided by Zhang et al. [19] for the avian influenza outbreak classification problem.

Our new approach distinguished itself from those predecessors in the way that it first clusters the positive data into groups before the classification process. This pre-processing prevents the drawback of using only a single hyper-sphere (i.e., the one generated by the one-class classifier) which may not provide a particularly compact support for the training data. Our experiments show that our approach significantly improves the accuracy of the classification.

The rest of this paper is organized as follows: Sect. 2 describes necessary preliminaries. Our multi one-class approach is described in Sect. 3 and evaluated in Sect. 4. Our main conclusions and future work can be found in Sect. 5.

2 Methods

2.1 One-Class Methods

In general, a binary learning (two-class) approach to miRNA discovery considers both positive (miRNA) and negative (non-miRNA) classes by providing examples for the two-classes to a learning algorithm in order to build a classifier that will attempt to discriminate between them. The most common term for this kind of learning is supervised learning where the labels of the two-classes are known beforehand.

One-class uses only the information for the target class (positive class) building a classifier which is able to recognize the examples belonging to its target and rejecting others as outliers. Among the many classification algorithms available, we chose four one-class algorithms to compare for miRNA discovery. We give a brief description of each one-class classifier and we refer the references [20, 21] for additional details including a description of parameters and thresholds. The LIBSVM library [22] was used as implementation of the SVM (one-class using the RBF kernel function) and the DDtools [23] for the other one-class methods. The WEKA software [24] was used as implementation of the two-class classifiers.

2.2 One-Class Support Vector Machines (OC-SVM)

Support Vector Machines (SVMs) are a learning machine developed as a two-class approach [25, 26]. The use of one-class SVM was originally suggested by [21]. One-class SVM is an algorithmic method that produces a prediction function trained to “capture” most of the training data. For that purpose a kernel function is used to map the data into a feature space where the SVM is employed to find the hyper-plane with maximum margin from the origin of the feature space. In this use, the margin to be maximized between the two classes (in two-class SVM) becomes the distance between the origin and the support vectors which define the boundaries of the surrounding circle, (or hyper-sphere in high-dimensional space) which encloses the single class.

2.3 One-Class Gaussian (OC-Gaussian)

The Gaussian model is considered as a density estimation model. The assumption is that the target samples form a multivariate normal distribution, therefore for a given test sample z in n-dimensional space, the probability density function can be calculated as:

$$ p(z) = \frac{1}{{(2\pi )^{{\frac{n}{2}}} |\Sigma |^{{\frac{1}{2}}} }}e^{{ - \frac{1}{{2\left( {z - \mu } \right)^{T}\Sigma ^{ - 1} \left( {z - \mu } \right)}}}} $$
(1)

where \( \mu \) and \( \varSigma \) are the mean and covariance matrix of the target class estimated from the training samples.

2.4 One-Class Kmeans (OC-Kmeans)

Kmeans is a simple and well-known unsupervised machine learning algorithm used in order to partition the data into k clusters. Using the OC-Kmeans we describe the data as k clusters, or more specifically as k centroids, one derived from each cluster. For a new sample, z, the distance d(z) is calculated as the minimum distance to each centroid. Then based on a user specified threshold, the classification decision is made. If d(z) is less than the threshold the new sample belongs to the target class, otherwise it is rejected.

2.5 One-Class K-Nearest Neighbor (OC-KNN)

The one-class nearest neighbor classifier (OC-KNN) is a modification of the known two-class nearest neighbor classifier which learns from positive examples only. The algorithm operates by storing all the training examples as its model, then for a given test example z, the distance to its nearest neighbor y (y = NN(z)) is calculated as d(z,y). The new sample belongs to the target class when:

$$ \frac{d(z,y)}{{d\left( {y,NN(y)} \right)}} < \delta $$
(2)

where NN(y) is the nearest neighbor of y, in other words, it is the nearest neighbor of the nearest neighbor of z. The default value of \( \delta \) is 1. The average distance of the k nearest neighbors is considered for the OC-KNN implementation.

3 Multi One-Class Classifiers

For a given data, the positive class might be consisting of different clusters (See Fig. 1). If we try to find the small hyper-sphere that contains all the data points belongs to the positive class, then the negative data points will be a part of the hyper-sphere yielding in low performance.

Fig. 1.
figure 1

The positive class consists of 4 sub-groups. The negative class is in blue color. Each cluster has a different color (pink, green, black, and red). (Color figure online)

To alleviate this type of data we propose the Multi One-Class Classifiers that works with small groups of the positive data. Our approach based on the OC-SVM that builds a hypersphere for a given positive data and we aim that working with compacted groups is better that work with wide group.

As a result, we run clustering to identify the similar data points. The main idea of our approach is to cluster the positive data into several clusters using the k-means clustering algorithm and then build up a classifier from each cluster using the OC-SVM as can be seen in Fig. 2.

Fig. 2.
figure 2

The Multi-OC trained over the positive examples. As can be seen, the positive examples are classified into four different hyper-spheres.

For a given new example, we apply all the classifiers. If all of them reject it then it considers as negative example; otherwise it considers as belongs to the one-class data. In this work, without loss of generality, we choose to evaluate our Multi One-Class approach using the OC-SVM as the basis classification algorithm and the standard K-means as the clustering method. We leave other possible combinations for future work.

4 Results

The first experiment is a typical one, which illustrates the performance of Multi-OC-SVM versus that of the normal OC-SVM over a synthetic data set (See Fig. 1). In each experiment, the data set is split into two subsets, one for training and one for testing. Both algorithms were trained using 80% of the positive class and the remaining 20% together with all the negative examples were used for testing. The positive examples are divided into four clusters beforehand. We used the standard k-means algorithm for this purpose.

As can be seen from Fig. 3, executing the OC-SVM method reveals into poor classification. This is to be expected since OC-SVM classifies part of the positive examples as negative ones (or as outliers). Figure 4 depicts the classification done by the Multi-OC-SVMs. As can be seen, obviously Multi-OC-SVM outperforms OC-SVM.

Fig. 3.
figure 3

The Multi-OC trained over the positive examples. As can be seen, the positive examples are classified into four different hyper-spheres.

Fig. 4.
figure 4

The accuracy of OC-SVM and Multi-OC-SVM as a function of the number of clusters generated by the Kmeans clustering algorithm.

Next we ran both OC-SVM and Multi-OC-SVM twenty times over the data set in order to evaluate their performance and stability using different values of k. The data set used for this experiment is similar to the first one. Additionally, the Matthews Correlation Coefficient (MCC) (see [27] for more details) measurement is used to take into account both over-prediction and under-prediction in imbalanced data sets. Looking at Fig. 4, one can see that the accuracy of Multi-OC-SVM is far better than that of OC-SVM. Furthermore, it shows that Multi-OC-SVM is stable over different number of clusters.

5 Conclusion

The current results show that it is possible to build up a multi one-class classifier with a combined clustering beforehand process based only on positive examples yielding a reasonable performance.

Further research can proceed in several interesting directions. First, the suitability of the framework of our approach to different data types can be investigated. Second, it would be interesting to apply our approach to other types of classifiers and to more robust clustering methods such as Mean-Shift [28]. Lastly, it would be interesting to test our approach on more real data-sets and real problems.