1 Introduction

During the last decades the attention of specialists has been attracted to theoretical and practical problems of systems analysis, control and information processing from various application areas, where a necessity to classify (to decompose) finite sets of data arises that can be often reduced to a solution of separability problems.

The problem of separation of sets, whose convex hulls have a nonempty intersection, represents, probably, one of the attractive objects of real-life interest. Such problems necessitate some more general and complicated concepts of separability than this one of linear splitting. Various definitions of generalized separability have been proposed to satisfy the demands from application areas [19]. Among others let us mention the employing k-functions and the majority committees [1, 4]. Polyhedral separability is shown itself as well rather relevant from theoretical view-point and suitable for practitioners [2, 3].

Among such approaches we recall the seminal contributions by Mangasarian [2] and Rosen [5], and the fundamental breakthrough due to the introduction the Support Vector Machine (SVM) method [6, 7]. Moreover, the field of data sets classification is enlarging ceaselessly and grasps new areas [411]. In addition, all separation problems turn out to be nonconvex, and therefore they need new mathematical apparatus to find a global solution, in particular, to escape local pits provided by local search algorithms including the classical ones (conjugate gradients methods, Newtonian ones, SQP ones, IMP etc).

On the other hand, all theoretical proposals have to be verified by a practical relevance and a possible issue to solve numerically the corresponding nonconvex extremum’s problems. A comparison of computational efficiency of global search algorithms such as Brunch & Bounds methods, cuts schemes committee’s methodology etc, gives rise to new ideas for constructing special procedures for solving classification and separation problems.

In the paper we consider the problem of spherical binary separability [8, 9, 11], which consists in finding a sphere, dissevering the sets \({\mathcal {A}}\) and \({\mathcal {B}}\) in the finite-dimensional space \(I\!\!R^n\). Since it is unknown in advance whether the sets may be separated with a sphere or not, the minimization of the classification error function arises here naturally [10]. This minimization problem turns out to be nonsmooth and nonconvex [1216], that leads us to a nonconvex variational problem.

Nonconvex problems with nondifferentiable functions are of special interest from the viewpoint of mathematical investigations because of their additional complexity and, on the other side, since they have a wide field of applications. Lately, the modelling of many real-life problems [1719] have led to nondifferentiable nonconvex optimization problems.

At the same time, in nonconvex problems, the direct application of standard methods may have unpredictable consequences [13, 15, 16, 20, 21], and sometimes may even distract one from the desired solution. So, it seems to be quite natural (but hardly ever grounded) the reaction of the specialists propagating methods of direct selection—such as the method of branches and bounds (and cuts methods), which, as known, suffer the curse of dimension, when the volume of computations grows exponentially side by side with the growth of the problems dimension [13, 15, 16]. We are sure, there exists also another way of solving nonconvex problems of high dimension.

Further the spherical separability problem for two sets can be viewed as a problem of D.C. minimization [1315]:

where \(g(\cdot ),\;h(\cdot )\) are convex and not necessarily differentiable functions, and set \(D\subset I\!\!R^n\) is also convex. From now on let us suppose that the function \(F(\cdot )\) over \(I\!\!R^n\) is bounded from below:

As well-known [13, 15, 16], the A.D.Alexandov’s functions (or the (D.C.) functions, which may be represented as a difference of two convex functions), form a linear space, which is dense in the space of continuous functions (in the topology of homogeneous convergence on compacts). So, problems of D.C. programming represent a rather large and, besides, very attractive class of optimization problems, for which we developed the theory of global search [13, 15].

According to the theory developed on the base of global optimality conditions (GOC), the process of seeking for a global solution in nonconvex optimization problems consists of the two principal stages: (i) a special local search, which takes into account the structure of the problem under scrutiny [14], and (ii) the procedures (based on the global optimality conditions [13, 15]) of escaping critical points (provided by the local search).

The paper is organized as follows. In Sect. 3 we consider the generalization of the special local search method for solving D.C. minimization problems for the nonsmooth case. In addition, the convergence conditions of the method under investigation, and the stopping criteria are also substantiated. In Sect. 4 the proposed method is verified on the test problems of classification [2224]. After that, on the base of global optimality conditions [13, 15] we develop a global search strategy for nonsmooth problems of D.C. minimization. In Sect. 6 we present the global search algorithm and the average of the tenfold cross-validation results of its computational testing on classification problems from [2224].

2 Problem statement

Consider two nonempty and disjoint finite sets \({\mathcal {A}}=\{a^1,\ldots ,a^M\}\) and \({\mathcal {B}}=\{b^1,\ldots ,b^N\}\) in the space \(I\!\!R^n\). Further let us denote the sphere S(xr) of radius \(r >0\) centered at a point \(x\in I\!\!R^n\).

Then the sets \({\mathcal {A}}\) and \({\mathcal {B}}\) are defined to be separated by a sphere S(xr), if the following inequalities hold

$$\begin{aligned} \parallel a^i-x\parallel \le r\;\;\;\forall i= & {} 1,\ldots ,M; \end{aligned}$$
(1)
$$\begin{aligned} \parallel b^j-x\parallel \ge r\;\;\;\forall j= & {} 1,\ldots ,N; \end{aligned}$$
(2)

where \(\parallel \cdot \parallel \) is the Euclidean norm in the space \(I\!\!R^n\).

The points of the sets \({\mathcal {A}}\) and \({\mathcal {B}}\) for which inequalities (1) and (2) are respectively satisfied, are called the well-classified points.

In the case, where one or more of the inequalities (1) or (2) are violated, we introduce the classification error, which is defined as follows:

$$\begin{aligned} \omega (x,r)\mathop {=}\limits ^{\triangle }\sum \limits _{i=1}^{M} \max \{0,\parallel a^i-x\parallel ^2 - r^2\} +\sum \limits _{j=1}^{N}\max \{0,r^2-\parallel b^j-x\parallel ^2\}. \end{aligned}$$
(3)

Now one can formulate the problem of spherical separation with a sphere of the minimal radius as the problem of minimization of the classification error function with the variables \(x\in I\!\!R^n\) and radius \(r\in I\!\!R \). It can be readily seen, that this problem is nonconvex, and the nonconvexity is generated by the terms \((- r^2)\) and \((-\parallel b^j-x\parallel ^2)\). Taking into account the definition (3) of the error function, we obtain the problem of unconstrained nondifferentiable nonconvex optimization (\(C >0\)):

In this problem we look for the priority between the two goals such as minimizing the radius and the classification error by means of varying the constant \(C>0\). It is clear that the function F(xr) is bounded from below by zero. So, the assumption \(({\mathcal {H}})\) in Problem \(({\mathcal {P}}_0)\) is fulfilled.

Further, to perform the investigations we need an exact D.C. representation of the nonconvex goal function \(F(\cdot )\) in Problem \(({\mathcal {P}}_0)\). Using the following equality

$$\begin{aligned} \max \{0,f_1(x)-f_2(x)\}+ f_2(x) - f_2(x) =\max \{f_1(x),f_2(x)\}-f_2(x), \end{aligned}$$

we immediately obtain the D.C. representation as follows [10]:

$$\begin{aligned} F(x,r)=g(x,r)-h(x,r), \end{aligned}$$
(4)

where

$$\begin{aligned}&\displaystyle g(x,r)\mathop {=}\limits ^{\triangle } r^2+C\sum \limits _{i=1}^{M} \max \{r^2,\parallel a^i-x\parallel ^2\} +C\sum \limits _{j=1}^{N}\max \{r^2,\parallel b^j-x\parallel ^2\},\nonumber \\&\displaystyle h(x,r)\mathop {=}\limits ^{\triangle }CMr^2+C\sum \limits _{j=1}^{N}\parallel b^j-x\parallel ^2. \end{aligned}$$
(5)

In the next section we consider a special local search method employing this D.C. decomposition.

3 A special local search method

It is well-known [1315, 25, 26] that in nonconvex problems, in particular, in D.C. programming problems, the classical convex optimization methods (as conjugate gradient, Newtonian, trust region methods etc.) turn out to be inefficient, in general, for reaching a global solution. In order to develop a search for a local solution to Problem \(({\mathcal {P}})\) we are going to apply the well-known DC-Algorithm (DCA) [1315, 27, 28]. As known, it consists, first, in linearizing, at a current point, the function \(h(\cdot )\) which defines the basic non-convexity of Problem \(({\mathcal {P}})\), and minimizing so constructed convex linearized problem. After that we carry out consecutive solution of these linearized problems. This algorithm provides critical points, that can be proved by using only tools of convex analysis [1315, 2931].

Now, let us describe the special local search method (SLSM) for a nonsmooth D.C. minimization Problem \(({\mathcal {P}})\).

Let us given a starting iterate \(x^0\in I\!\! R^n\). Suppose, a point \(x^s\in D\) and a subgradient \(x^*_s\in \partial h(x^s)\) are provided. Then we find \(x^{s+1}\in D\) as an approximate solution to the linearized problem as follows

so that the linearization is produced at the point \(x^s\). More precisely, it means that the next iterate \(x^{s+1}\) satisfies the inequality as follows

$$\begin{aligned} g(x^{s+1})-\langle x^*_s,x^{s+1}\rangle \le \inf _{x\in D}\{g(x)-\langle x^*_s,x\rangle \}+\delta _s, \end{aligned}$$
(6)

where a sequence \(\{\delta _s\}\) fulfils the following conditions

$$\begin{aligned} \delta _s\ge 0,\;s=0,1,2,\ldots ;\;\sum _{s=0}^{\infty }\delta _s<\infty . \end{aligned}$$
(7)

Note, that the linearized problem \(({\mathcal {PL}}_s)\) turns out to be convex, meanwhile Problem \(({\mathcal {P}})\) was a nonconvex one.

Let us denote the optimal value of the linearized problem \(({\mathcal {PL}}_s)\) by \({\mathcal {V(PL}}_s)\), so that

$$\begin{aligned} {\mathcal {V(PL}}_s):=\inf _{x}\{g(x)-\langle x^*_s,x\rangle \,|\,x \in D\}. \end{aligned}$$
(8)

Theorem 1

[13, 15] Suppose,that the goal function \(F(\cdot )\) of Problem \(({\mathcal {P}})\) is bounded from below, and one can find a subgradient \(x^*\in \partial h(x)\) of the function \(h(\cdot )\) at any point \(x\in D\), where D is a closed convex set from \( I\!\!R^n\).

Then

  1. i)

    the sequence \(\{x^s\}\) generated by the rule (6), (7), where \(x^*_s\in \partial h(x^s)\), satisfies the condition:

    $$\begin{aligned} \lim _{s\rightarrow \infty }\{\inf _{x}[g(x)-g(x^{s+1}) -\langle x^*_s,x-x^{s+1}\rangle ]\}=0, \end{aligned}$$
    (9)

    or what is the same

    $$\begin{aligned} \lim _{s\rightarrow \infty } [{\mathcal {V(PL}}_s)-\varPhi _s (x^{s+1})]=0. \end{aligned}$$
  2. ii)

    Furthermore, if the sequences \(\{x^s\}\) and \(\{x^*_s\}\) are converging in concordance, i.e.

    $$\begin{aligned} \begin{array}{c} x^s\rightarrow z\in D,\;\; x^*_s\rightarrow z^*\in \partial h(z),\;\; x^*_s \in \partial h(x^s), \end{array} \end{aligned}$$
    (10)

    then the limit point z of the sequence \(\{x^s\}\) satisfies the condition

    $$\begin{aligned} \inf _{x\in D}\{g(x)-g(z)-\langle z^*,x-z\rangle \}=0,\;z^*\in \partial h(z), \end{aligned}$$
    (11)

    or what is the same \({\mathcal {V(PL}}_z)=\varPhi _* (z) \mathop {=}\limits ^{\triangle } g(z) - \langle z^*, z \rangle . \)

Thus, the point z in Theorem 1 is a solution to the linearized problem

Definition 1

  1. i)

    A point z is called critical if it satisfies condition (11), i.e. it is a solution to the linearized problem \(({\mathcal {PL}}_z)\) with a subgradient \(z^*\in \partial h(z)\).

  2. ii)

    A point \(x^s\) is an approximate \(\tau \)-critical if it is a \(\tau \)-solution to Problem \(({\mathcal {PL}}_s)\) with a subgradient \(x^*_s \in \partial h(x^s),\) i.e.

    $$\begin{aligned} g(x^s)-\langle x^*_s,x^s\rangle \le \inf \limits _{x\in D}\{g(x)-\langle x^*_s,x\rangle \}+\tau . \end{aligned}$$

Now let us investigate the issue related to a stopping criterion for the presented method. From (6) due to convexity of the function \(h(\cdot )\) it follows

$$\begin{aligned} -\delta _s\le & {} \inf \limits _{x\in D}\{g(x)-\langle x^*_s,x\rangle \}-g(x^{s+1})+ \langle x^*_s,x^{s+1}\rangle \\\le & {} g(x^s)-g(x^{s+1})+\langle x^*_s,x^{s+1}-x^s\rangle \\\le & {} g(x^s)-g(x^{s+1})+h(x^{s+1})-h(x^s)=F(x^s)-F(x^{s+1}), \end{aligned}$$

which can be considered as a hint for the convergence proof of the method. Moreover, it suggests that one of the following inequalities can be employed as a stopping criterion for SLSM (together with \(\delta _s\le \tau /2\)):

$$\begin{aligned} \left. \begin{array}{c} F(x^s)-F(x^{s+1})\le \tau /2 ,\\ \varPhi _s (x^{s})-\varPhi _s (x^{s+1})\mathop {=}\limits ^{\triangle } g(x^s)-g(x^{s+1})+\langle x^*_s,x^{s+1}-x^s\rangle \le \tau /2, \end{array}\right\} \end{aligned}$$
(12)

where \(F(\cdot )\) and \(\varPhi _s(\cdot )\) are the objective functions of Problems \(({\mathcal {P}})\) and \(({\mathcal {PL}}_s)\), respectively, and \(\tau \) is a given accuracy, and \(x^*_s\in \partial h(x^s)\).

If one of the inequalities (12) is fulfilled, it can be easily shown that point \(x^s\) turns out to be \(\tau \)-critical to Problem \(({\mathcal {P}})\), when \(\delta _s\le \tau /2\). Indeed, (12) together with inequality (6) imply that

$$\begin{aligned} g(x^{s})-\langle x^*_s,x^s\rangle \le \tau /2+g(x^{s+1})- \langle x^*_s,x^{s+1}\rangle \le \inf \limits _{x\in D}\{g(x)- \langle x^*_s,x\rangle \}+\tau /2+\delta _{s}. \end{aligned}$$

Therefore, if \(\delta _s\le \tau /2\), then the point \(x^{s}\) is a \(\tau \)-solution to Problem \(({\mathcal {PL}}_s)\).

Thus, in the nonsmooth optimization Problem \(({\mathcal {P}})\) SLSM described above provides for an approximate critical point. Henceforth below we assume \( D= I\!\!R^n\).

4 Testing the local search method (DCA)

The local search algorithm just presented above has been tested on nine test problems from [2224]. The computational experiment has been conducted on the computer Intel Pentium 4 CPU 3 GHz.

Further the barycenter of set \({\mathcal {A}}:\)

$$\begin{aligned} x^0=(x^0_1,\ldots ,x^0_n),\;\;x^0_i=\displaystyle \frac{a^1_i+\ldots +a^M_i}{M},\;\;i=1,\ldots ,n, \end{aligned}$$
(13)

has been chosen as a starting point for the center of a possibly separating sphere, and the solution to Problem \(({\mathcal {P}}_0)\) with the fixed initial center \(x^0\) (corresponding to the minimization of the radius) has been chosen as the starting radius \(r_0\).

On the other hand, the SLSM’s framework requires to solve at each iteration a convex Problem \(({\mathcal {PL}}_s)\). To this end we have employed Shor’s r-algorithm [18]. It is based on the operation of stretching of the space in the direction of the difference of two sequential subgradients.

In addition, we have adopted the tenfold cross-validation protocol, which consists in splitting the dataset of interest into ten equally sized pieces. Nine of them are in turn used as a training set and the remaining one as a testing set. The correctness of the following computational simulation can be estimated by the total percentage of well classified points (of both sets \({\mathcal {A}}\) and \({\mathcal {B}}\)) when the algorithm stops.

Table 1 represents the average of the tenfold cross-validation results of computational testing of the local search algorithm, where we used the notations as follows: n is the space dimension; M and N stand for the numbers of points to be classified in the sets \({\mathcal {A}}\) and \({\mathcal {B}}\), respectively; C stands for the values of parameter C in the goal function of Problem \(({\mathcal {P}}_0)\); \(F_0=F(x^0,r_0)\) is the starting value of the goal function of Problem \(({\mathcal {P}}_0)\); F(z) is the value of the function at the critical point (\(z=(x,r)\)) provided by SLSM; iter is the number of Linearized Problems solved (iterations of SLSM); time is the CPU time of computing solutions (seconds); % stands for the percentage of well classified points; %[10] stands for the percentage of well-classified points in the paper [10].

Table 1 Local search

As the stopping criterion for SLSM we have used the following inequalities (see (12)):

$$\begin{aligned} |\varPhi _s(x^s)-\varPhi _s(x^{s+1})|\le \frac{\tau }{2}, \;\;\; \delta _s \le \frac{\tau }{2}; \end{aligned}$$

with the precision of \(\tau =10^{-3}\) (likewise in [10]).

The average number of iterations turned out to be 41, which corresponds to the number of linearized problems solved, and the average CPU time for solving one problem was 1.22 s.

From Table 1 it can be readily seen, that we have managed to enlarge the percentage of well-classified points in problems “ionosphere”, “g50c”, “g10n” only at the stage of local search. It is easy to see that the results of Table 1 can be viewed as more promising than these ones from [10]. It may be assessed so, in particular, because of the choice of Shor’s r-algorithm [18] for solving Problem \(({\mathcal {PL}}_s)\) instead of the subroutine NCVX [8], which implements a bundle type algorithm for solving the nonsmooth unconstrained optimization problems as it was done in [10]. On the other hand, the principal effect is attained due to the employing of the general framework of SLSM.

5 Optimality conditions and the global search strategy

Let us recall the fundamental result of global search theory for d.c. programming.

Theorem 2

[13, 15] Suppose that

$$\begin{aligned} \exists v\in I\!\! R^n:\;\;F(v) > F(z)\mathop {=}\limits ^{\triangle }\zeta . \end{aligned}$$
(14)

Then point \(z\in I\!\! R^n\) is a global solution to Problem \(({\mathcal {P}})\) if and only if

$$\begin{aligned} \left. \begin{array}{l} (a)\;\;\;\;\forall (y,\beta )\in I\!\! R^{n+1} :\;\;\beta =h(y)+\zeta ,\\ (b)\;\;\;\;g(y)\le \beta \le \sup (g,I\!\! R^n),\;\;\forall y^* \in \partial h(y):\\ (c)\;\;\;\;g(y)-\beta \ge \langle y^*,x-y\rangle \;\;\forall x \in I\!\! R^n. \end{array}\right\} ({\mathcal {E}}) \end{aligned}$$

Conditions \(({\mathcal {E}})\) are related to classical optimality conditions, employ the idea of linearization with respect to the basic nonconvexity and possess a constructive (algorithmic) property (in case when variational inequality (c) of conditions \(({\mathcal {E}})\) does not hold, there exists a rule of constructing a feasible point, which is better than the point z under scrutiny)[13, 15].

Now introduce the following function

$$\begin{aligned}&\displaystyle \varphi (z)=\sup \limits _{x,y,\beta ,y^*}\{\langle y^*,x-y\rangle +\beta -g(x)\;|\;\beta =h(y)+\zeta ,\nonumber \\&\displaystyle g(y)\le \beta \le \sup (g,I\!\! R^n),\;y^*\in \partial h(y)\}. \end{aligned}$$
(15)

Since \(\beta =\beta _{0}\mathop {=}\limits ^{\triangle } g(z)\) for \(y=z\), the following inequality obviously holds

$$\begin{aligned} 0=\beta _{0}+\left\langle z^*,z-z\right\rangle -g(z)\le \varphi (z)\;\;\;\;\; \forall z\in I\!\! R^n\;\;\forall z^*\in \partial h(z). \end{aligned}$$

Hence

$$\begin{aligned} \varphi (z)\ge 0\;\;\;\;\; \forall z\in I\!\! R^n. \end{aligned}$$
(16)

Therefore, conditions \(({\mathcal {E}})\) are equivalent to the equality \(\varphi (z)=0\). Furthermore, optimality conditions for the minimizing sequences can be presented in the following form.

Theorem 3

[13, 15] If a sequence \(\{ z^{k}\}\) is minimizing to Problem \(({\mathcal {P}})\,(\{ z^{k}\}\in {\mathcal {M}})\), then

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\varphi (z^{k})=0. \end{aligned}$$
(17)

If, in addition, the following condition holds

$$\begin{aligned} \exists v\in I\!\! R^n,\; \exists \chi >0: F(v)\ge F(z^{k})+\chi ,\; k=0,1,2,\ldots , \end{aligned}$$
(18)

then the optimality condition (17) becomes sufficient for the sequence \(\{ z^{k}\}\) to be minimizing \((\{ z^{k}\}\in {\mathcal {M}})\) in Problem \(({\mathcal {P}})\).

The proof of this theorem does not principally differ from the proof in the differentiable case [13].

The properties of Optimality Conditions \(({\mathcal {E}})\) allow one to construct algorithms for solving D.C. minimization problems, which presume two principal stages, as follows:

  1. (a)

    local search, which provides for an approximately critical point \(z^k\) with the corresponding value of the goal function \(\zeta _k \mathop {=}\limits ^{\triangle } F(z^k)\);

  2. (b)

    procedures of escaping critical points, which are based on the Global Optimality Conditions (GOC) \(({\mathcal {E}})\).

The latter procedure can be represented as a chain of the following operations [13, 15, 29]:

  1. (1)

    Choose a number \(\beta :\; \inf (g,I\!\!R^n)\le \beta \le \sup (g,I\!\!R^n)\). It is possible to choose an initial \(\beta _0\), for instance, which is equal to \(g(z^k)\) (\(\beta _0=g(z^k),\,\zeta _k=F(z^k)=g(z^k)-h(z^k)\)).

  2. (2)

    Construct a finite approximation

    $$\begin{aligned} {\mathcal {A}}_k(\beta )=\{ v^1,\ldots ,v^{N_k}\;|\;h(v^i) =\beta -\zeta _k,\; i=1,\ldots ,N_k,\;N_k=N_k(\beta )\} \end{aligned}$$

    of the level surface \(\{ h(x)=\beta -\zeta _k\}\) of the function \(h(\cdot )\), which generates the basic nonconvexity in Problem \(({\mathcal {P}})\).

  3. (3)

    Find a \(\delta _k\)-solution \(\bar{u}^i\) of the following Linearized Problem:

    $$\begin{aligned} g(x)-\langle v^*_i,x\rangle \downarrow \min \limits _x,\;x\in I\!\!R^n, ({\mathcal {PL}}_i) \end{aligned}$$

    where \( v^*_i \in \partial h(v^i)\) is a subgradient of \(h(\cdot )\) at the point \(v^i\), so that

    $$\begin{aligned} g(\bar{u}^i)-\langle v^*_i,\bar{u}^i\rangle -\delta _k\le \inf _x\{g(x)-\langle v^*_i,x\rangle \}. \end{aligned}$$
    (19)

    After that, apply SLSM starting at the point \(\bar{u}^i\). Let \(u^i\) be a point provided by SMLP.

  4. (4)

    Find a \(\delta _k\)-solution \(w^i\) (\(h(w^i)=\beta -\zeta _k\)) of the level problem, i.e.

    $$\begin{aligned} \langle w^*_i,u^i-w^i\rangle +\delta _k\ge \sup \limits _{v,v^*}\{\langle v^*,u^i-v\rangle |\;h(v)=\beta -\zeta _k,\;\; v^* \in \partial h(v)\}, \end{aligned}$$
    (20)

    where \( w^*_i \in \partial h(w^i)\).

  5. (5)

    Compute the value \(\eta _k(\beta ):=\eta ^0_k(\beta )+\beta ,\) where

    $$\begin{aligned} \eta ^0_k (\beta ):= \langle w^*_j,u^j-w^j\rangle -g(u^j)\mathop {=}\limits ^{\triangle }\max \limits _{i\in I_k} \{\langle w^*_i,u^i-w^i\rangle -g(u^i)\}, \end{aligned}$$
    (21)

    \( w^*_i \in \partial h(w^i),\,i\in I_k\mathop {=}\limits ^{\triangle }\{i\in \{1,\ldots ,N_k\}|\;g(v^i)\le \beta \}\).

    If \(\eta _k(\beta )>0\) then it can be readily seen that the point \(u^j\) is better than z with respect to the goal function. In this case, one can pass to the next iteration of the algorithm and again execute the local search by starting from the point \(u^j\). If still \(\eta _k(\beta )\le 0\), then a new value of \(\beta \) should be chosen, and the iteration should be repeated.

6 The global search algorithm and its numerical testing

Now let us describe the principal stages of the algorithm developed on the basis of the global search strategy (GSS) developed above for nonsmooth problem of D.C. minimization.

Note that the SLSM already described in Sect. 3 and tested in Sect. 4 has been applied as a local search method.

Further, it is necessary to determine the boundaries for the number \(\beta \) within the interval \([\beta _-,\beta _+]\), where \(\beta _-\mathop {=}\limits ^{\triangle }\inf (g,I\!\!R^n),\;\; \beta _+\mathop {=}\limits ^{\triangle }\sup (g,I\!\!R^n)\).

However, it is clear that the search for numbers \(\beta _-,\,\beta _+\) is equivalent to an unconstrained minimization and maximization of the function \(g(\cdot )\), respectively. It can be readily seen from (5), that the function \(g(\cdot )\) is convex and takes only nonnegative values. Therefore, one can take \(\beta _-=0,\,\beta _+=+\infty \), i.e. the interval of variation of parameter \(\beta \) is not bounded from above.

On the other side, it is possible to estimate the upper bound \(\beta _+\) in another way, for example \(\beta _+:=g(x^0,\bar{r})\), where \(x^0\) is the barycenter of set \({\mathcal {A}}\) from (13), and \(\bar{r}^2=\max \limits _j ||b^j||^2\). So, we have restricted the interval of variation of parameter \(\beta \) and we can change the number \(\beta \), for example, as follows \(\beta _{p+1}:=\beta _p+\varDelta \beta ,\;\;\varDelta \beta :=(\beta _+-\beta _-)/10\). In other words, the procedure of choosing the value \(\varDelta \beta \) can be carried out by means of one of well-known methods of one-dimensional optimization.

Further, the crucial element of the Global Search procedures consists in construction of an approximation of the level surface of the convex function \(h(\cdot )\) which generates the basic nonconvexity in Problem \(({\mathcal {P}}_0)\). In particular, an approximation \({\mathcal {A}}_k(\beta )\) for each pair \((\beta ,\zeta _k),\;\;\zeta _k=F(z^k)\), can be constructed by the rule as follows

$$\begin{aligned} v^l=z^k -\mu _le^l,\;\;l=1,\ldots ,n, \end{aligned}$$
(22)

where \(z^k\) is a current critical point; \(e^l\) is the lth unit vector of Euclidean basis of \(I\!\!R^n\). A search for \(\mu _l\) turns out to be rather simple and moreover analytical for the quadratic function \(h(\cdot )\), because it reduces itself to the solution of a quadratic equation in one variable. The set (approximation) (22) has shown itself rather competitive [13] during the computational simulations.

Repeat, that later on stage 3 of GSS, instead of solving only one Linearized Problem, we additionally initiate SLSM, i.e. we solve a sequence of Linearized Problems until obtaining an approximate critical point. The structure of GSS shall not be destroyed by this change, and we can obtain a feasible point with some better and attractive properties. On stage 4 of GSS, instead of solving the level problem and the subsequent forming the value \(\eta _k\), we execute direct comparison of the value of the goal function at the point \(u^i\) provided by the local search, with the value \(\zeta _k\mathop {=}\limits ^{\triangle } F(z^k)\). Besides, it can be readily seen that \(\eta _k>0\) if and only if point the point \(u^i\) is better than \(z^k\) with respect to the objective function [13].

Let us describe now the global search algorithm (GSA) step by step.

Let us given a starting point \(x^0\), and the accuracy \(\tau = 10^{-3}\).

  1. Step 0.

    Set \(k:=0,\;\;x^k:=x^0,\;\;l:=1,\;\;\tau _k:=0.1\).

  2. Step 1.

    Starting from the point \(x^k\) and applying SLSM construct a \(\tau _k\)-critical point \(z^k\), such that \(\zeta _k\mathop {=}\limits ^{\triangle } F(z^k)\le F(x^k).\)

  3. Step 2.

    Set \(\beta :=g(z^k).\)

  4. Step 3.

    Construct the point \(v^l\) of the approximation \({\mathcal {A}}_k(\beta )\) according to the rule (22), \(h(v^l) =\beta -\zeta _k.\)

  5. Step 4.

    Starting from the point \(v^l\) construct by means of the SLSM a \(2\tau _k\)-critical point \(u^l\).

  6. Step 5.

    If \(\zeta _k > F(u^l)\) then set \(x^{k+1}:=u^l,\; \zeta _{k+1}:=F(u^l),\,l:=l+1\). If \(\tau _k >\tau \) then set \(\tau _{k+1}:=\displaystyle \frac{\tau _k}{ 2}\). After that set \(k:=k+1\) and loop to Step 1.

  7. Step 6.

    If \(\zeta _k \le F(u^l)\) and \(l<N\) then set \(l:=l+1\) and loop to Step 3.

  8. Step 7.

    If \(\zeta _k \le F(u^l)\) and \(l=N\), and \(\tau _k >\tau \) then set \(\tau _k:=\displaystyle \frac{\tau _k}{ 2},\, l:=1 \) and loop to Step 1.

  9. Step 8.

    If \(\zeta _k \le F(u^l)\) and \(l=N\), and \(\tau _k\le \tau \), and \(\beta <\beta _+\) then set \(\tau _k:=\tau ,\,\beta :=\beta +\varDelta \beta ,\, l:=1 \) and loop to Step 3.

  10. Step 9.

    If \(l=N,\; \zeta _k \le F(u^l)\; \forall \beta \in [\beta _-,\beta _+]\) (i.e. the one-dimensional search with respect to parameter \(\beta \) on the segment \([\beta _-,\beta _+]\) is over) then Stop: \(z^k\) is a feasible point provided by GSA.

Table 2 represents the average results of computational testing of the GSA (using adopted tenfold cross-validation protocol as well as in the local search testing), where \(F_0\) is the starting value of the cost function of Problem \(({\mathcal {P}}_0)\); \(F_*\) is the best value of the goal function provided by GSA; \(\omega (x^*,r_*) \) is the classification error; PL is the number of Linearized Problems solved during the Global Search; Loc is the number of start-ups of the local search procedure in course of the global search; iter is the number of iterations of GSA; time is the CPU time of the program implementing GSA.

Table 2 Global search

Note, that the the average number of different critical points from column iter, by which GSA passed improving, at the same time, the goal function of Problem \(({\mathcal {P}}_0)\), is 18, while the average number of start-ups of the local search procedure from column Loc is 695.

As expected, in all test problems the biggest classification error \(\omega (x^*,r_*) \) has been obtained at \(C=0.1\), what corresponds to the priority of minimizing radius of the sphere.

Next, Table 3 demonstrates a relative efficiency of GSA with adopted tenfold cross-validation protocol. Furthermore, in Table 3 one can observe the data related to the average percentage of well-classified points obtained with the algorithms (i) GSA and (ii) DCA [9, 10]. The last column in the table stands for the relative loss (“–”) or the relative win (“+”) as a percentage of the number of well-classified points compared to the results from [9, 10].

Table 3 Average percentage of testing correctness and comparison

Note that despite the application of a rather simple approximation of the level surface and a rather simplified search of the parameter \(\beta \), our GSA is able to increase the percentage of well-classified points in all the examples. In problems “ionosphere” and “g50c” the increment has turned out to be rather substantial: it was more than 1.7 and 1.6 % well classified points correspondingly with respect to the results from [10]. As to Problems “sonar” and “g10n”, the percentage of well-classified points is more than 1 in comparison with the results from [9, 10]. And for the only one Problem “galaxy” the percentage of well-classified points is less than 0.4 in comparison with the results from [9, 10].

We have added n points \(\;-v^l, \;\;l=n+1,\ldots ,2n, \) in the approximation constructed at the Step 3 of the GSA, and tried to increase the percentage of well-classified points in Problem “galaxy”. The CPU time of the program implemented with the parameter \(C=10\) increased up to 2:29:31.68 and the percentage of well-classified points reached 90.36.

7 Conclusion

In the paper we considered the binary separability problem which consists in splitting two finite-dimensional sets by means of a sphere. This problem is formulated as minimization of the classification error function which is nonconvex and nonsmooth. Besides, the error function can be represented as a difference of two convex functions, so the problem falls into D.C. optimization area [13, 15].

This feature opens the way of seeking for a global solution to the problem by means of the Theory of Global Search developed in [1315]. The core of this theory is constructed by Global Optimality Conditions (GOC). The suitable for our case formulation of GOC was mentioned in the paper. On the base of the Theory of Global Search the Global Search Algorithm has been developed.

The computational testing of developed algorithms has been carried out on well known classification test problems [2224] with the tenfold cross-validation technique.

The comparison of computational efficiency of the developed approach has been done with results from [10]. We plan to continue the comparison in order to have more large field of testing results. Nevertheless, one can see a comparative effectiveness of global search algorithm developed above, the implementation of which was a first attempt. We ponder now about how it can be improved taking into account the promising results of computational experiments. At the same time we plan to enlarge the field of test examples but till present we have found only [32], which together with [10] was apparently sufficient for our first attempt to attack spherical separability.