Enclosing machine learning: concepts and algorithms

Wei, Xun-Kai; Li, Ying-Hong; Li, Yu-Fei; Zhang, Dong-Fang

doi:10.1007/s00521-007-0113-y

Enclosing machine learning: concepts and algorithms

Original Article
Published: 03 May 2007

Volume 17, pages 237–243, (2008)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Neural Computing and Applications Aims and scope Submit manuscript

Enclosing machine learning: concepts and algorithms

Download PDF

Xun-Kai Wei¹,
Ying-Hong Li¹,
Yu-Fei Li¹ &
…
Dong-Fang Zhang¹

286 Accesses
8 Citations
Explore all metrics

Abstract

A novel machine learning paradigm, i.e., enclosing machine learning based on regular geometric shapes was proposed. First, it adopted regular minimum volume enclosing and bounding geometric shapes (sphere, ellipsoid, box) or their unions and so on to obtain one class description model. Second, Data description, two class classification, learning algorithms based on the one class description model were presented. The most obvious feature was that enclosing machine learning emphasized one class description and learning. To illustrate the concepts and algorithms, a minimum volume enclosing ellipsoid (MVEE) case for enclosing machine learning was then investigated in detail. Implementation algorithms for enclosing machine learning based on MVEE were presented. Subsequently, we validate the performances of MVEE learners using real world datasets. For novelty detection, a benchmark ball bearing dataset is adopted. For pattern classification, a benchmark iris dataset is investigated. The performance results show that our proposed method is comparable even better than Support Vector Machines (SVMs) in the datasets studied.

A new multi-class classification method based on minimum enclosing balls

Article 09 August 2015

SVM Based EVM for Open Space Problems

On the Construction of Extreme Learning Machine for One Class Classifier

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Cognitive process is the instinctive learning ability of human being. Human always transfers the feature information to the brain through perception, and then the brain will process the feature information and remember the feature information for the given objects. According to the cognitive science theory, the human brain can be imitated but can not be completely reproduced. Currently, artificial intelligence is an important direction of function imitation of the human brain.

Neural-computing and neural networks (NN) families which base on the neuron working mechanism have made great achievements in various aspects. Recently, statistical learning and support vector machines (SVM) are drawing extensive attention and show attractive and excellent performances in various areas compared with NN, which imply that artificial intelligence can also be made via advanced statistical computing theory.

It should be noted that as for both NN and SVM, the function imitation of human cognitive process for pattern classification can be explained as follows [1]. Given the training pairs (sample features, class indicator), we can train a NN or a SVM learning machine. The training process of these learning machines actually imitates the learning ability of human being. For clarity, we call this process “cognizing”. Then, the trained NN or SVM can be used for testing an unknown sample and determine the class indicator it belongs to. The testing process of an unknown sample actually imitates the recognizing process of human being. We call this process “recognizing”.

From the mathematic point of view, both these two learning machines are based on the hyperplane adjustment, and obtain the optimum or sub-optimum hyperplane combinations after the training process. As for NN, each neuron acts as a hyperplane in the feature space. The feature space is divided into many partitions according to the selected training principle. Each feature space partition is linked to a corresponding class, which accomplishes the “cognizing” process. Given an unknown sample, it only detects the partition where the sample locates in and then assigns the indicator of this sample, which accomplishes the “recognizing” process. Like NN, SVM is based on the optimum hyperplane. Unlike NN, standard SVM determines the hyperplane via solving a convex optimization problem. They have the same “cognizing” and “recognizing” processes except the different solving strategies.

If a totally unknown and novel sample comes, both SVM and NN will not recognize it correctly and would prefer to assign a most close indicator in the learned classes [2, 3]. This is generally wrong in fact, and here comes the topic which this paper concerns with. The root cause of this phenomenon is the learning principle, which is based on feature space partition. This kind of learning principle may amplify each class’s region especially when the samples are small due to incompleteness. This makes it impossible to automatically detect the novel samples. Here comes the problem: how to make it clever enough to automatically identify the novel samples.

The rest of this paper is organized as follows. Section 2 gives the basic concepts of enclosing learning machine. Section 3 then describes the proposed one-class description algorithm based on MVEE cognitive learner, and shows how this can be used to build corresponding cognitive learner in kernel-defined feature space. Section 4 presents a novel two-class classification algorithm based on single MVEE cognitive learner. Experimental results are presented in Sect. 5, and Sect. 6 contains some concluding remarks.

2 Enclosing machine learning

As is known , human being generally cognize things of one kind and recognize completely unknown things of a novel kind easily. So the question is, why not make the learning machine “cognize” or “recognize” like human being. In other words, the learning machine should “cognize” the training samples of the same class. Each class is cognized or described by a cognitive learner. It uses some kind of model to describe each class instead of using feature space partition so as to imitate the “cognizing” process. The bounding and closing boundary of each cognitive learner scatters in the feature space. For an unknown sample, the cognitive class recognizer recognizes, then detects whether the unknown sample is located inside a cognitive learner’s boundary to imitate the “recognizing” process. If the sample is completely new (i.e., none of the trained cognitive learner contains the sample), it can again be described by a new cognitive learner and the new obtained learner can be added to the feature space without changing others. We call this feedback process active self-learning. This concludes the basic concepts of enclosing machine learning. The basic flow of enclosing machine learning is depicted in Fig. 1.

Now, we can investigate the definition of the cognitive learner. The cognitive leaner should own at least following features:

a.
regular and convenient to calculate,
b.
bounding with the minimum volume,
c.
convex bodies to guarantee optimality,
d.
fault tolerant to guarantee generalization performance.

The basic geometric shapes perhaps are the best choices. Because they are all convex bodies and the operations like intersection, union or complement of the basic geometric shapes can be implemented easily. So we propose to use basic geometric shapes such as sphere, box or ellipsoid. The cognitive learner is then to use these geometric shapes to enclose all the given samples with minimum volume in the feature space. This is the most important reason why we call this learning paradigm enclosing machine learning. Here we give the definition of the cognitive learner and recognizer.

Definition 1 A cognitive learner is defined as the bounding and closing boundary of a minimum volume set which encloses all the given samples. The cognitive learner can be either a sphere or an ellipsoid or their combinations. Currently only ellipsoid is investigated for illustration.

Definition 2 A cognitive recognizer is defined as the point detection and assignment algorithm.

Figure 2 gives a geometric illustration of the differences between enclosing machine learning and feature space partition learning paradigm. For the cognizing (or learning) process, each class is described by a cognitive class description learner. For the recognizing (or classification) process, we only need to check which bounding learner the testing sample locates inside. For the learning process, each two classes are separated via a hyperplane (or other boundary forms, such as hypersphere). For the classification process, we need to check whether it is located on the left side or the right side of the hyperplane and then assign the corresponding class indicator. We can see that the feature space partition learning paradigm in fact amplifies the real distribution regions of each class. But the enclosing machine learning paradigm obtains more reasonable distribution region of each class.

3 MVEE cognitive learner for one class description

3.1 Preliminaries

Our concern is with covering n given points ${x_{i} \in {\Re}^{k} ,\,X: = [x_{1} \;x_{2} \; \ldots \;x_{n} ]}$ with an ellipsoid of minimum volume[4, 5]. To avoid trivialities, we also make the following assumption for the remainder of this paper, which guarantees that any ellipsoid containing ${X: = [x_{1} \;x_{2} \; \ldots \;x_{n} ]}$ has positive volume:

Assumption 1 The affine hull of ${X: = [x_{1} \;x_{2} \; \ldots \;x_{n} ]}$ spans ${{\Re}^{k}}$ .Equivalently, ${\operatorname{rank} {\left[ {\begin{array}{*{20}c} {{A^{\rm T}}}\\ {{e^{\rm T}}} \end{array}} \right]} = k + 1},$ where e is a vector of ones, A denotes the n × k matrix, whose rows are the given points.

Definition 3 For center ${c \in {\Re}^{k} }$ and shape matrix ${E \in S_{++}^{{k \times k}} },$ we define the ellipsoid

$${\varepsilon (E,c): = {\left\{{x \in {\Re}^{k} |(x - c)^{\rm T} E(x - c) \leqslant 1} \right\}} .}$$

where ${E \in S_{{++}}^{{k \times k}}}$ determines the shape and directions of the ellipsoid. The length of the axes is given by ${{\left[ {{\sqrt {\lambda_{1}}},{\sqrt {\lambda_{2}}}, \ldots, {\sqrt {\lambda_{k}}}} \right]}},$ where ${{\left[ {\lambda_{1}, \lambda_{2}, \ldots, \lambda_{k}} \right]}}$ are the corresponding eigenvalues of the matrix E.

Definition 4 For ${x_{i} \in {\Re}^{k},\,X: = [x_{1} \;x_{2} \; \ldots \;x_{n} ]},$ a MVEE cognitive learner is defined as the boundary of all the possible enclosing ellipsoids with the minimum volume.

Under the Assumption 1, a natural formulation of a minimum volume ellipsoid enclosing can be obtained via solving the following convex minimization problem,

$${\begin{aligned}\,& {\mathop {\min}\limits_M}\quad - \ln \det M \\\,& s.t.\quad {\left({Mx_{i} - z} \right)}^{\rm T} {\left({Mx_{i} - z} \right)} \leqslant 1, \quad \forall i = 1,2, \ldots, n, \\\,& \quad \quad M \succ 0 \end{aligned} }$$

(1)

where, ${M = {\sqrt E},\;z = c{\sqrt E}},$ square root of X is defined as : ${{\sqrt X} = V^{\rm T} {\sqrt {D_{{{\left[ {d_{{ii}}} \right]}}}}}V,\,D_{{{\left[ {d_{{ii}}} \right]}}} }$ is an element-wise square root of eigenvalues.

Definition 5 Decompose ${\tilde{M} \in {\Re}^{{{\left({k + 1} \right)} \times {\left({k + 1} \right)}}} = {\left({{\sum\nolimits_{i = 1}^n {\alpha_{i} \tilde{x}_{i}}} \tilde{x}^{\rm T}_{i}} \right)}^{{- 1}} = {\left({\begin{array}{*{20}c} {s}& {{v^{\rm T}}}\\ {v}& {F}\\ \end{array}} \right)},\, \tilde{x}_{i} = {\left({\begin{array}{*{20}c} {1}\\ {{x_{i}}} \\ \end{array}} \right)}},$ denote ${v = - F \tilde{z},\,F \in {\Re}^{{k \times k}}, \,v \in {\Re}^{k}, \,s}$ is a const, denote ${\delta = 1 - s + \tilde{z}^{\rm T} F \tilde{z}},$ the linear transformation $f: {{\Re}^{{k + 1}} \varepsilon {\left({\tilde{M},0} \right)} \to {\Re}^{k} \varepsilon {\left({M,z} \right)}}$ is defined as ${\left\{{\begin{array}{*{20}c} {{z = \tilde{z}}}\\ {{M = \delta^{{- 1}} F}} \end{array}} \right.}.$ Ellipsoid ɛ (E,c) can be computed from ${\left\{{\begin{array}{*{20}c} {{E = {\left({\delta^{{- 1}} F} \right)}^{\rm T} {\left({\delta^{{- 1}} F} \right)}}}\\ {{c = - F^{{- 1}} v(E)^{{- \frac{1}{2}}}}} \end{array}} \right..}$

Lemma 1 The minimization the volume of the ellipsoid ${\varepsilon {\left({M,z} \right)} \hbox{in}\;{\Re}^{k} }$ is equivalent to minimization the volume of the ellipsoid ${\varepsilon {\left({\tilde{M},0} \right)}}$ in augmented ${{\Re}^{{k + 1}} }$ centered at the origin using linear transformation f[6].

According to Lemma 1 and Definition 5, Eq. (1) can be rewritten as:

$${\begin{aligned}\,& {\mathop {\min}\limits_{\tilde{M}}}\quad - \ln \det \tilde{M}\\ & s.t.\quad \tilde{x}^{\rm T}_{i} \tilde{M} \tilde{x}_{i} \leqslant 1, \quad \forall i = 1,2, \ldots, n,\\ & \quad \quad \tilde{M} \succ 0 \end{aligned} }$$

(2)

The dual form is:

$${\begin{aligned}\,& {\mathop {\max}\limits_{\alpha_{i}}}\quad \ln \det {\sum\limits_{i = 1}^n {\alpha_{i} \tilde{x}_{i} \tilde{x}^{\rm T}_{i}}} \\\,& s.t.\quad {\sum\limits_{i = 1}^n {\alpha_{i}}} = k + 1\\ & \quad \quad 0 \leqslant \alpha_{i} \leqslant 1,\quad \forall i = 1,2, \ldots, n\\ \end{aligned} }$$

(3)

where α is the dual variable.

We name this one class description method OCMVEE for clarity, where OCMVEE stands for one class minimum volume enclosing ellipsoid.

According to KKT conditions, we can get ${ \tilde{M}^{{- 1}} = {\sum\nolimits_{i = 1}^n {\alpha_{i} \tilde{x}_{i} \tilde{x}^{\rm T}_{i}}}}.$ We define ${A:A_{{ii}} = a_{i} = {\sqrt {\alpha_{i}}} \geqslant 0,\,X = {\left({\tilde{x}_{1}, \tilde{x}_{2}, \ldots, \tilde{x}_{n}} \right)}}.$ Because ${{\sum\limits_{i = 1}^n {\alpha_{i} \tilde{x}_{i} \tilde{x}^{\rm T}_{i}}} = X^{\rm T} A^{2} X, \tilde{M}^{{- 1}} = {\left({AX} \right)}^{\rm T} {\left({AX} \right)} = X^{\rm T} A^{2} X,\, {\left({AX} \right)}{\left({AX} \right)}^{\rm T} = AXX^{\rm T} A}.$ Using singular value decomposition method, we can naturally define ${X^{\rm T} A^{2} X = P\Lambda P^{\rm T}, \,AXX^{\rm T} A = AKA = Q\Lambda Q^{\rm T} },$ and thus we can obtain the following important equation ${P = X^{\rm T} AQ\Lambda^{-1/2}} $. Using eigenspectrum analysis, we can infer following important lemma.

Lemma 2 For ${ \tilde{M}^{{- 1}} = {\left({AX} \right)}^{\rm T} {\left({AX} \right)} = X^{\rm T} A^{2} X,\,{\left({AX} \right)}{\left({AX} \right)}^{\rm T} = AXX^{\rm T} A},$ following identities are concluded [7, 8]:

$${\begin{aligned}\,& \ln \det {\left({X^{\rm T} A^{2} X} \right)} = {\sum\limits_{i:\lambda_{i} \ne 0} {\ln {\left({\lambda_{i}} \right)}}} + {\left({k + 1 - \# {\left\{{\lambda_{i} \ne 0} \right\}}} \right)}\ln {\left(0 \right)} \\\,& \ln \det {\left({AXX^{\rm T} A} \right)} = {\sum\limits_{i:\lambda_{i} \ne 0} {\ln {\left({\lambda_{i}} \right)}}} + {\left({n - \# {\left\{{\lambda_{i} \ne 0} \right\}}} \right)}\ln {\left(0 \right)}\\ \end{aligned} }$$

(4)

where, λ_i are nonzero eigenvalues. (Proof Omitted)

3.2 Regularized self-adaptive MVEE cognitive learner

According to Lemma 2, it is obvious that there are probable existences of zero eigenvalues. So it is wise to introduce a regularized item to avoid this situation. And it is natural to add a regularized item μI in the ln det (•) objective function. According to Lemma 2, we can easily conclude following identities:

$${\begin{aligned}\,& \ln \det {\left({X^{\rm T} A^{2} X + \mu I} \right)} = {\sum\limits_{i:\lambda_{i} \ne 0} {\ln {\left({\lambda_{i} + \mu} \right)}}} + {\left({k + 1 - \# {\left\{{\lambda_{i} \ne 0} \right\}}} \right)}\ln {\left(\mu \right)} \\\,& \ln \det {\left({AXX^{\rm T} A + \mu I} \right)} = {\sum\limits_{i:\lambda_{i} \ne 0} {\ln {\left({\lambda_{i} + \mu} \right)}}} + {\left({n - \# {\left\{{\lambda_{i} \ne 0} \right\}}} \right)}\ln {\left(\mu \right)} \end{aligned} }$$

(5)

To realize this regularized operation, we can add an additional item ${\mu\;\operatorname{trace} {\left({\tilde{M}} \right)}}$ due to ${\operatorname{trace} {\left({\tilde{M}} \right)} = {\sum\nolimits_{i = 1}^{k + 1} {\frac{1}{{\lambda_{i}}}}}},$ where λ_i is eigenvalue of M ^{− 1}. Then the primal regularized self-adaptive MVEE can be written as:

$${\begin{aligned}\,& {\mathop {\min}\limits_{\tilde{M},\xi_{i}}}\quad - \ln \det \tilde{M} + \mu\;\operatorname{trace} {\left({\tilde{M}} \right)} + \frac{1}{n}{\sum\limits_{i = 1}^n {\xi_{i}}} + \nu \rho, \\\,& s.t.\quad \tilde{x}^{\rm T}_{i} \tilde{M} \tilde{x}_{i} \leqslant \rho + \xi_{i}, \quad \forall i = 1,2, \ldots, n, \\\,& \quad \quad \tilde{M} \succ 0,\xi_{i} \geqslant 0,\rho \geqslant 0, \quad \forall i = 1,2, \ldots, n \end{aligned} }$$

(6)

where ν ≥ 0 is now a user-specified parameter that equals the fraction of objects outside the optimized ellipsoid. ρ is a variable controls the volume according to ν, ξ_i is slack variable tolerates the misclassified samples.

By introducing dual variables α_i, β_i, γ ≥ 0, the Lagrange dual form is:

$${\begin{aligned}\,& {\mathop {\max}\limits_{\alpha_{i}}}\quad \ln \det {\left({{\sum\limits_{i = 1}^n {\alpha_{i} \tilde{x}_{i} \tilde{x}^{\rm T}_{i}}} + \mu I} \right)} \\\,& s.t.\quad \;{\sum\limits_{i = 1}^n {\alpha_{i}}} = \nu, \\\,& \quad \quad \;0 \leqslant \alpha_{i} \leqslant \frac{1}{n}, \quad \forall i = 1,2, \ldots n \end{aligned} }$$

(7)

ρ^*, ξ ^*_i can be determined using following KKT conditions:

$${\left| {\begin{array}{*{20}c} {{\alpha^{*}_{i} {\left({\rho^{*} + \xi^{*}_{i} - \tilde{x}^{\rm T}_{i} U^{*} U^{{\rm *T}} \tilde{x}_{i}} \right)} = 0}}\\ {{\beta^{*}_{i} \xi^{*}_{i} = 0,{\left({\frac{1}{n} - \alpha^{*}_{i}} \right)}\xi^{*}_{i} = 0}}\\ {{\gamma^{*} \rho^{*} = 0}} \end{array}} \right.}$$

(8)

For a given sample ${ \tilde{x}_{i} },$ we only need to check whether it is located inside the MVEE. If it satisfies ${ \tilde{x}_{i}^{\rm T} \tilde{M} \tilde{x}_{i} \leqslant 1},$ then the sample is inside the MVEE, otherwise the sample is outside the MVEE.

3.3 Kernel regularized self-adaptive MVEE cognitive learner

Matrix X ^T A ² X and AXX ^T A have the same nonzero eigenvalues. According to (5), we have following identity:

$${\ln \det {\left({AXX^{\rm T} A + \mu I} \right)} = \ln \det {\left({X^{\rm T} A^{2} X + \mu I} \right)} + {\left({n - {\left({k + 1} \right)}} \right)}\ln {\left(\mu \right)}}$$

(9)

Later, we will explain how to use this important equation.

As for the inner product XX ^T, we can find a kernel K which satisfies Mercer condition and then replace the inner product, i.e., XX ^T = K. We get AXX ^T A = AKA and (9) can be rewritten as:

$${\ln \det (\hbox{AKA} + \mu I) = \ln \det (X^{\rm T} AAX + \mu I) + (n - k - 1)\ln (\mu)}$$

(10)

Hence we can optimize ln det (AKA + μ I) instead of ln det (X ^T AAX + μ I). The corresponding kernel-regularized self-adaptive MVEE can be written as:

$${\begin{aligned}\,& {\mathop {\max}\limits_{\alpha_{i}}}\quad \ln \det {\left({\hbox{AKA} + \mu I} \right)} \\\,& s.t.\quad \;{\sum\limits_{i = 1}^n {\alpha_{i}}} = \nu, \\\,& \quad \quad \;0 \leqslant \alpha_{i} \leqslant \frac{1}{n}, \quad \forall i = 1,2, \ldots n\\ \end{aligned} }$$

(11)

So as to connect (11) with the dual variable α, we define G ∈ R ^{n × n} to be a Cholesky factor of K, i.e. K = GG ^T. Then we get ${\hbox{AKA} = AGG^{\rm T} A,\,G^{\rm T} A^{2} G = {\sum\nolimits_{i = 1}^n {\alpha_{i} g_{i} g^{\rm T}_{i}}} + \mu I},$ where g _i is the ith column of K. According to (9), ${AGG^{\rm T} A}$ and G ^T A ² G have the same eigenvalues, such that ${\ln \det (\hbox{AGG}^{\rm T} A + \mu I) = \ln \det (G^{\rm T} A^{2} G + \mu I) = {\sum\nolimits_{i = 1}^n {\alpha_{i} g_{i} g^{\rm T}_{i}}} + \mu I}.$ Thus, we obtain the final dual kernel-regularized self-adaptive MVEE:

$${\begin{aligned}\,& {\mathop {\max}\limits_{\alpha_{i}}}\quad \ln \det {\left({{\sum\limits_{i = 1}^n {\alpha_{i} g_{i} g^{\rm T}_{i}}} + \mu I} \right)} \\\,& s.t.\quad \;{\sum\limits_{i = 1}^n {\alpha_{i}}} = \nu, \\\,& \quad \quad \;0 \leqslant \alpha_{i} \leqslant \frac{1}{n}, \quad \forall i = 1,2, \ldots n\\ \end{aligned} }$$

(12)

Equation (12) is convex, and can be solved via the state of the art convex programming solvers such as Yalmip [9]. And then we can get the kernel-regularized MVEE cognitive learner. Now we should consider the point detection of the test samples. For ellipsoid ${\varepsilon (\tilde{M},0)},$ the Mahalanobis distance is defined as ${d{\left({\tilde{x}, \tilde{M}} \right)} = \tilde{x}^{\rm T} \tilde{M} \tilde{x}, \tilde{x} \in R^{{k + 1}}}.$ Due to the existence of matrix ${ \tilde{M}},$ the Mahalanobis distance ${d{\left({\tilde{x}, \tilde{M}} \right)}}$ cannot be directly expressed in kernel form. Thus, we have to find another way to deduce the kernel form of the Mahalanobis distance.

Note that X ^T A ² X = PΛ P ^T, we have ${{\sum\nolimits_{i = 1}^n {\alpha_{i} \tilde{x}_{i} \tilde{x}^{\rm T}_{i}}} + \mu I = P{\left({\Lambda + \mu I} \right)}P^{\rm T} + P^{\bot} {\left({\mu I} \right)}P^{\bot T}, \,P^{\bot} = I - P,\, \tilde{M} = {\left({{\sum\nolimits_{i = 1}^n {\alpha_{i} \tilde{x}_{i} \tilde{x}^{\rm T}_{i}}} + \mu I} \right)}^{{- 1}}, \,P = X^{\rm T} AQ\Lambda^{{- \frac{1}{2}}} }.$ Then the Mahalanobis distance can be rewritten as:

$${d{\left({\tilde{x}, \tilde{M}} \right)} = \frac{1}{\mu}k{\left({\tilde{x}, \tilde{x}} \right)} - \frac{1}{\mu}{\mathbf{k}}^{\rm T} AQ{\left({\Lambda + \mu I} \right)}^{{- 1}} Q^{\rm T} A{\mathbf{k}}}$$

(13)

where, ${{\mathbf{k}} = {\left({k{\left({\tilde{x}_{1}, x} \right)},k{\left({\tilde{x}_{2}, x} \right)}, \ldots, k{\left({\tilde{x}_{n}, x} \right)}} \right)}^{\rm T}, \,Q,\Lambda }$ can be determined via ${\hbox{AKA} = Q\Lambda Q^{\rm T} .}$

4 MVEE cognitive learner for two class classification

A novel two class classification algorithm based on single MVEE cognitive learner is presented. The main idea is to use MVEE to enclose all the samples of one class (m samples, y _i = 1) and to try to exclude the samples of the other class (n samples, y _i = − 1). We name this method MVEEC (Minimum Volume Enclosing Ellipsoid for Classification) for clarity. The problem can be formulated as:

$${\begin{aligned}\,& {\mathop {\min}\limits_{\tilde{M},\xi_{i}, \xi_{j}}}\quad - \ln \det {\left({\tilde{M}} \right)} + \frac{1}{l}{\sum\limits_{i = 1}^l {\xi_{i}}} + \nu \rho \\\,& s.t.\quad y_{i} \tilde{x}^{\rm T}_{i} \tilde{M} \tilde{x}_{i} \leqslant y_{i} \rho + \xi_{i}, \quad \forall i = 1,2, \ldots, l \\\,& \quad \quad \tilde{M} \succ 0,\rho > 0,\xi_{i} \geqslant 0,y_{i} \in {\left\{{1, - 1} \right\}}, \quad \forall i = 1,2, \ldots, l \end{aligned} }$$

(14)

where, l is the total number of the samples, i.e., l = m + n.

According to the optimality conditions and KKT conditions, the Lagrange dual of (14) is:

$${\begin{aligned}\,& {\mathop {\max}\limits_\alpha}\quad \ln \det {\left({{\sum\limits_{i = 1}^l {y_{i} \alpha_{i} \tilde{x}_{i} \tilde{x}^{\rm T}_{i}}}} \right)}\\ & s.t.\quad \;\;{\sum\limits_{i = 1}^l {y_{i} \alpha_{i}}} = \nu, \\\,& \quad \;\quad \;0\;\, \leqslant \alpha_{i} \leqslant \frac{1}{l}, \quad \forall i = 1,2, \ldots, l \end{aligned} }$$

(15)

where, ρ*, ξ*_i can be computed via following KKT conditions:

$${\left| {\begin{array}{*{20}c} {{\alpha^{*}_{i} {\left({y_{i} \rho^{*} + \xi^{*}_{i} - y_{i} \tilde{x}^{\rm T}_{i} U^{*} U^{* T} \tilde{x}_{i}} \right)} = 0,\forall {\left({\alpha^{*}_{i} = \frac{1}{l}} \right)}}}\\ {{{\left({\frac{1}{l} - \alpha^{*}_{i}} \right)}\xi^{*}_{i} = 0,\forall {\left({0 \leqslant \alpha^{*}_{i} < \frac{1}{l}} \right)}}} \end{array}} \right.}$$

(16)

5 Experiments

This section investigates the enclosing learning machine on two well-known benchmark datasets: a ball bearing dataset for novelty detection [10], an iris dataset for classification [11]. For obtaining a quantity investigation of the performances of MVEE learner, One Class SVM (OCSVM) and NuSVM (a two class classification algorithm) are adopted in corresponding dataset for performance comparison under the same backgrounds. That is to say, we compare OCMVEE method with OCSVM for the novelty detection problem. And we compare MVEEC method with NuSVM for the two-class classification problem. We use libsvm [12] for implementation of OCSVM and NuSVM. The MVEE-enclosing learning machines are programmed in Matlab via Yalmip. Both linear kernel and RBF kernel for the three methods are investigated. The optimum parameters of OCSVM, NuSVM, MVEE, and MVEEC are determined via fivefold Cross-Validation.

5.1 Novelty detection

The data comes from a real ball bearing type 6204 (steel cage), with rotational frequency 24.5625 Hz (Tacho-signal used for the measurement), sampling frequency 16384 Hz, minimum frequency 0.7 Hz, Data acquisition system (B&K analyzer). Each file consists of 10–12 vectors including 2,048 samples. Each instance consisted of 2,048 samples of acceleration. After preprocessing with a discrete Fast Fourier Transform each such instance had 32 attributes. The dataset consisted of five categories: normal data from new ball bearings and four types of abnormalities, i.e., fault 1 (outer race completely broken), fault 2 (broken cage with one loose element), fault 3 (damaged cage with four loose elements) and fault 4 (a badly worn ball bearing with no evident damage) (Table 1).

Table 1 Ball-bearing dataset novelty detection success rate

Full size table

5.2 Pattern classification

The data set contains three classes of 50 instances each, where the three classes refer to Iris Setosa, Iris Versicolour and Iris Virginica. Each class has four attributes: sepal length, sepal width, petal length, petal width. Class Setosa is linearly separable from the other two classes Versicolour and Virginica. Since Versicolour and Virginica class are nonlinearly separable, we only report this two class classification problem (Table 2).

Table 2 Iris dataset total classification error rate

Full size table

6 Conclusions

We propose a novel machine learning paradigm based on minimum volume enclosing shapes called enclosing machine learning and illustrated the concepts and algorithms using minimum volume enclosing ellipsoid. We have developed MVEE class description algorithm and two class classification algorithm, and validated the algorithms via benchmark datasets. The results prove the proposed MVEE-enclosing learning machines are comparable even better than SVMs in the datasets studied.

References

Li Y-H, Wei X-K, Liu J-X (2004) Engineering applications of support vector machines, 1st edn. Weapon Industry Press, Beijing
Google Scholar
Li Y-H, Wei X-K (2005) Fusion development of support vector machines and neural networks. J Air Force Eng Univ 4:70–73
Google Scholar
Wei X-K, Li Y-H, Feng Y (2006) Comparative study of extreme learning machine and support vector machines. In: Jun Wang et al (eds) Advances in neural networks—ISNN 2006. Lecture Notes in Computer Science, vol 3971. Springer, Heidelberg, pp 1089–1095
Sun P, Freund RM (2004) Computation of minimum-volume covering ellipsoids. Oper Res 5:690–706
Article MathSciNet Google Scholar
Vandenberghe L, Boyd, S, Wu S-P (1998) Determinant maximization with linear matrix inequality constraints. SIAM J Matrix Anal Appl 2:499–533
Article MathSciNet Google Scholar
Glineur F (1998) Pattern separation via ellipsoids and conic programming. Mémoire de D.E.A., Faculté Polytechnique de Mons, Mons, Belgium
Shawe-Taylor J, Williams C, Cristianini N, Kandola JS (2002) On the eigenspectrum of the gram matrix and its relationship to the operator eigenspectrum. In: CesaBianchi N et al (eds) Proceedings of the 13th international conference on algorithmic learning theory (ALT2002). Lecture Notes in Artificial Intelligence, vol 2533. Springer, Heidelberg, pp 23–40
Dolia AN, Page SF, White NM, Harris CJ (2004) D-optimality for minimum volume ellipsoid with outliers. In: Proceedings of the seventh international conference on signal/image processing and pattern recognition (UkrOBRAZ’2004)73–76
Löfberg J (2006) YALMIP: A toolbox for modeling and optimization in MATLAB. http://control.ee.ethz.ch/ ∼joloef
Ball-Bearing Dataset (2007) http://www.hi.is/∼runson/svm
Newman DJ, Hettich S, Blake CL, Merz CJ (1998) Repository of machine learning databases. http://www.ics.uci.edu/∼mlearn/MLRepository
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/ ∼cjlin/libsvm

Download references

Acknowledgments

This paper is jointly supported by NSFC and CAAC under grant #60672179, and also supported by the Doctorate Foundation of the Engineering College, Air Force Engineering University of China under grant #BC0501.

Author information

Authors and Affiliations

School of Engineering, Air Force Engineering University, Xi’an, 710038, Shaanxi Province, China
Xun-Kai Wei, Ying-Hong Li, Yu-Fei Li & Dong-Fang Zhang

Authors

Xun-Kai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Ying-Hong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Fei Li
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Fang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xun-Kai Wei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, XK., Li, YH., Li, YF. et al. Enclosing machine learning: concepts and algorithms. Neural Comput & Applic 17, 237–243 (2008). https://doi.org/10.1007/s00521-007-0113-y

Download citation

Received: 08 October 2006
Accepted: 30 March 2007
Published: 03 May 2007
Issue Date: June 2008
DOI: https://doi.org/10.1007/s00521-007-0113-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Enclosing machine learning: concepts and algorithms

Abstract

Similar content being viewed by others