Keywords

Introduction

The paper deals with multiclass pattern recognition problem in a geometric formulation. Different approaches to solving such a problem could be found in [1, 2, 5, 8, 12, 15, 18, 19, 21]. Mathematical models for solving applied pattern recognition problems are considered in [1,2,3,4, 12, 13]. In this paper there is proposed a method for solving this problem which is based on the idea of separability of convex hulls of sets of training sample. The convex-hulls and other efficient linear approaches for solving similar problems were also proposed in [2, 6, 7, 17]. To implement this method, two auxiliary problems are considered: the problem of selecting extreme points in a finite set of points in the space \({{\mathbb {R}}^{n}}\), and the problem of determining the distance from a given point to the convex hull of a finite set of points in the space \({{\mathbb {R}}^{n}}\) using tools of known software packages for solving mathematical programming problems. An efficiency and power of the proposed approach are demonstrated on classical Irises Fischer problem  [16, 22] as well as on several applied economical problems.

Let a set of n-dimensional vectors be given in the space \({{\mathbb {R}}^{n}}\)

$$\begin{aligned} {\text {A}}=\left\{ {{{\text {a}}}_{i}}=\left( {{a}_{i1}},{{a}_{i2}},...,{{a}_{in}} \right) \right\} :i=\left[ 1,N \right] , \end{aligned}$$
(1)

and let there also be given a separation of this set into m classes

$$\begin{aligned} {\text {A}}={{A}_{1}}\,\dot{\cup }\,{{A}_{2}}\,\dot{\cup }\,\ldots \,\dot{\cup }\,{{A}_{m}}. \end{aligned}$$
(2)

You need to construct a decision rule for assigning an arbitrary vector \({{{\text {a}}}_{i}}\) to one of the m classes.

There are a number of methods  [14, 23] for solving this multiclass pattern recognition problem in a geometric formulation: linear classifiers, committee constructions, multiclass logistic regression, methods of support vectors, nearest neighbors, and potential functions. These methods are related to metric classification methods and are based on the ability to measure the distance between classified objects, or the distance between objects and hypersurfaces that separate classes in the feature space. This paper develops an approach related to convex hulls of subsets \({{A}_{i}},\ \text { }i=\left[ 1,m \right] \), of the family \({\text {A}}\).

1 Multiclass Pattern Recognition Algorithm Based on Convex Hulls

The main idea of the proposed approach is as follows.

Let for the given family of points \(\mathrm {A}\), which is separated into m classes \(A_i\) where \(i\in \left[ 1 , m \right] \), corresponding convex hulls \(conv\ A_{i}\) contain only points from classes \({{A}_i}\) respectively. Then it is natural to assume that any point \(x\in conv\ A_i\) represents a vector belonging to the class \(A_i\). Below, we will extend this idea for the general case.

Definition 1

The set \({{A}_{i}}\) from (2), where \(i\in \left[ 1, m\right] \), is named a convex separable set (CS-set, CSS), if the following holds

$$\begin{aligned} conv\ {{A}_{i}}\cap {{A}_{j}}=\varnothing ,\text { }\forall \ j\in \left[ 1,m \right] \backslash \left\{ i \right\} . \end{aligned}$$
(3)

If the family \(\mathrm {A}=\left\{ A_1,\ldots ,A_m\right\} \) contains a CSS \(A_{i_0}\), then it is natural to assume that each point \(x\in conv{A_{i_0}}\) belongs to the corresponding set \(A_{i_0}\). In such a case the set \(A_{i_0}\) can be excluded from the further process of constructing the decision rule. In other words, the condition \(x\in conv A_{i_0}\) must be checked first, and further process on the assigning point x to one of classes from training sample, must continue if and only if \(x\not \in conv A_{i_0}\).

An interesting case of families (1) is when you can specify a sequence \(({{i}_{1}}, {{i}_{2}},\ldots , {{i}_{m}})\), which is a permutation for the sequence \((1,2,\ldots , m)\), and such that

$$\begin{aligned} \left\{ \begin{array}{*{35}{l}} conv\text { }{{A}_{{{i}_{1}}}}\cap \bigcup \limits _{k=2}^{m}{{{A}_{{{i}_{k}}}}}=\varnothing , \\ conv\text { }{{A}_{{{i}_{2}}}}\cap \bigcup \limits _{k=3}^{m}{{{A}_{{{i}_{k}}}}}=\varnothing , \\ \ldots , \\ conv\text { }{{A}_{{{i}_{m-1}}}}\cap {{A}_{m}}=\varnothing . \\ \end{array} \right. \end{aligned}$$
(4)

The problem of constructing a decision rule for the family (1) with properties (4) will be called as CSS-solvable.

We denote by class(x) the class number of [1, m], to which the point x belongs. Thus, if \(x\in {{A}_{i}}\), \(i\in [1, m]\), then \(class(x)=i\). For the point \(x\not \in \text {A}\), the problem of pattern recognition in the geometric formulation is to construct a decision rule for determining class(x) for \(x\in {{\mathbb {R}}^{n}}{\setminus } \text {A}\).

Let’s consider the case of \(m=2\), i.e. \({\text {A}}={{A}_{1}}\,\dot{\cup }\,{{A}_{2}}\). Let’s construct convex hulls \(conv\ {{A}_{1}}\) and \(conv\ {{A}_{2}}\). It is natural to assume that if \(x\in conv\ {{A}_{1}}{\setminus } conv\ {{A}_{2}}\), then \(class(x)=1\).

Similarly, if \(x\in conv\ {{A}_{2}}{\setminus } conv\ {{A}_{1}}\), then we assume that \(class(x)=2\). If \(x\notin conv\ {{A}_{1}}\cup conv\ {{A}_{2}}\), it is natural to assume that the point x belongs to such a class whose convex hull is located closer to the point x.

Let’s denote by \( \rho \left( x, conv\ {{A}^{'}} \right) \) the distance from the point x to the convex hull of a finite set \({{A}^{'}}\subset {{\mathbb {R}}^{n}}\). Then we have

Finally, let’s consider the case of \(x\in conv\ {{A}_{1}}\cap conv\ {{A}_{2}}\).

Let’s consider the following two sets.

(5)

Logically there are possible cases:

Following the assumption mentioned above, i.e. \({x\in conv\ {{A}_{1}}\cap conv\ {{A}_{2}}}\), we have:

Case 4 leads us to the following situation.

We have a family of two subsets \({\text {A}}^{'}=\left\{ A_1^{'},A_2^{'} \right\} \), which locate inside the set \(conv\ A_1\cap conv\ A_2\). You need to construct a decision rule for assigning the vector \(x\in conv\ A_1\cap conv\ A_2\) to one of the two classes \(A_1^{'}, A_2^{'}\) and, respectively, \(A_1,A_2\).

This problem corresponds to the original one, and therefore the proposed algorithm can be re-applied. Repeating the process we become to situation when for regular sets of the form (5) there holds \(conv\ A_{1}^{''}\cap conv\ A_{2}^{''}=\varnothing \), and thus the process will be completed.

Proposition 1

If for the sets \({{A}_{1}}, {{A}_{2}}\) we have \({{A}_{1}}\cap {{A}_{2}}=\varnothing \), then algorithm described above converges, i.e. for any point x from \({{A}_{1}}\cup {{A}_{2}}\) it will lead to the case 1, 2 or 3 (*).

Proof

Let’s consider the following chain of pairs of sets

\(A={{A}_{1}}\cup {{A}_{2}},\ \ {{C}_{1}}=conv\ {{A}_{1}},{{C}_{2}}=conv\ {{A}_{2}}\):

\(A_{1}^{(1)}={{A}_{1}}\cap {{C}_{1}}\cap {{C}_{2}}\)

\(A_{2}^{(1)}={{A}_{2}}\cap {{C}_{1}}\cap {{C}_{2}}\)

\({{A}^{(1)}}=A_{1}^{(1)}\cup A_{2}^{(1)},\ \ C_{1}^{(1)}=conv\ A_{1}^{(1)},C_{2}^{(1)}=conv\text { }A_{2}^{(1)}\)

\(A_{1}^{(2)}=A_{1}^{(1)}\cap C_{1}^{(1)}\cap C_{2}^{(1)}\)

\(A_{2}^{(2)}=A_{2}^{(1)}\cap C_{1}^{(1)}\cap C_{2}^{(1)}\)

\(\ldots \)

\({{A}^{(k-1)}}=A_{1}^{(k-1)}\cup A_{2}^{(k-1)},\ \ C_{1}^{(k-1)}=conv\ A_{1}^{(k-1)},C_{2}^{(k-1)}=conv\ A_{2}^{(k-1)}\)

\(A_{1}^{(k)}=A_{1}^{(k-1)}\cap C_{1}^{(k-1)}\cap C_{2}^{(k-1)}\)

\(A_{2}^{(k)}=A_{2}^{(k-1)}\cap C_{1}^{(k-1)}\cap C_{2}^{(k-1)}\)

\({{A}^{(k)}}=A_{1}^{(k)}\cup A_{2}^{(k)},\ \ C_{1}^{(k)}=conv\ A_{1}^{(k)},C_{2}^{(k)}=conv\ A_{2}^{(k)}\)

\(\ldots \)

Let’s show that at some step one of the conditions \(A_{1}^{(k)}=\varnothing \) or \(A_{2}^{(k)}=\varnothing \) will be hold, which means that the proposed algorithm converges.

Let’s show that at any step we will have \(\left| A_{1}^{(k+1}\cup A_{2}^{(k+1)} \right| <\left| A_{1}^{(k)}\cup A_{2}^{(k)} \right| \). Since \({{A}_{1}}\cap {{A}_{2}}=\varnothing \), then \(A_{1}^{(k)}\cap A_{2}^{(k)}=\varnothing \).

On the other hand,

$$\begin{aligned} A_{1}^{(k+1)},A_{2}^{(k+1)}\subseteq conv\text { }A_{1}^{(k)}\cap conv\text { }A_{2}^{(k)}. \end{aligned}$$
(6)

Let’s show that there is a point \(x\in A_{1}^{(k)}\cup A_{2}^{(k)}\) such that \(x\in conv\ A_{1}^{(k)}\cap conv\ A_{2}^{(k)}\). Let’s assume the opposite:

$$\begin{aligned} {\left\{ \begin{array}{ll} A_{1}^{(k)}\subseteq conv\text { }A_{1}^{(k)}\cap conv\text { }A_{2}^{(k)}, \\ A_{2}^{(k)}\subseteq conv\text { }A_{1}^{(k)}\cap conv\text { }A_{2}^{(k)}. \end{array}\right. } \end{aligned}$$
(7)

Therefore, we have

$$\begin{aligned} {\left\{ \begin{array}{ll} conv\text { }A_{1}^{(k)}\subseteq conv\text { }A_{1}^{(k)}\cap conv\text { }A_{2}^{(k)}, \\ conv\text { }A_{2}^{(k)}\subseteq conv\text { }A_{1}^{(k)}\cap conv\text { }A_{2}^{(k)}. \end{array}\right. } \end{aligned}$$
(8)

On the other hand, by the definition of a convex hull, we get

$$\begin{aligned} {\left\{ \begin{array}{ll} conv\text { }A_{1}^{(k)}\supseteq conv\text { }A_{1}^{(k)}\cap conv\text { }A_{2}^{(k)}, \\ conv\text { }A_{2}^{(k)}\supseteq conv\ A_{1}^{(k)}\cap conv\text { }A_{2}^{(k)}. \end{array}\right. } \end{aligned}$$
(9)

From (8) and (9) there follows that

$$\begin{aligned} conv\text { }A_{1}^{(k)}=conv\text { }A_{2}^{(k)}. \end{aligned}$$
(10)

From (10) there follows that

$$\begin{aligned} {\left\{ \begin{array}{ll} ext\left( conv\ A_{1}^{(k)} \right) =ext\left( conv\ A_{2}^{(k)} \right) \subseteq A_{1}^{(k)}, \\ ext\left( conv\ A_{2}^{(k)} \right) =ext\left( conv\ A_{2}^{(k)} \right) \subseteq A_{2}^{(k)}. \end{array}\right. } \end{aligned}$$
(11)

Hence, \(A_{1}^{(k)}\cap A_{2}^{(k)}\ne \varnothing \), which contradicts the assumption above. Thus, the proposition is proved.

Let’s consider the case \(m>2\).

Just as in the case of \(m=2\), the solution of the multiclass pattern recognition problem is reduced to solving a series of similar problems characterized by a sequential decreasing their dimensions. To characterize such a problem, we need to specify the following.

(12)

Let’s denote by \(\left\langle x^{'}, X^{'}, J^{'}, \mathrm {A}^{'}, C\left( \mathrm {A}^{'}\right) \right\rangle \) the problem of determining whether a point \(x^{'}\in X^{'}\) belongs to one of the classes \({{J}^{'}}\subset J\), provided by training sample \(\mathrm {A}^{'}\) with a set of convex hulls \(C\left( \mathrm {A}^{'}\right) \).

Further classification of the point \({{x}^{'}}\in {{X}^{'}}\) will be determined by the value

$$ {{M}^{'}}=\left| \left\{ i\in {{J}^{'}}:{{x}^{'}}\in C_{i}^{'} \right\} \right| $$

and will break up into 3 cases: \(M^{'}=0,\ {{M}^{'}}=1\) and \({{M}^{'}}>1\). Let rules of obtaining the problem \(\left\langle {{x}^{''}},{{X}^{''}},{{J}^{''}},\mathrm {A}^{''}, C\left( \mathrm {A}^{''}\right) , M^{''}\right\rangle \) in case \(\left| M^{'}\right| >1\) are as following:

$$\begin{aligned} \left. \begin{aligned}&x^{''}=x^{'}, \\&J^{''}=\left\{ i\in J^{'}:x^{''}\in C_i^{'}\right\} , \\&M^{'}=\left| J^{''}\right| , \\&X^{''}=\cap \left\{ C_i^{'}:i\in J^{''}\right\} , \\&\mathrm {A}^{''}=\left\{ A_i^{''}=A_i^{'}\cap X^{''}:i\in J^{''}\right\} , \\&C\left( \mathrm {A}^{''}\right) =\left\{ C_i^{''}=conv A_i^{''}:i\in J^{''}\right\} . \\ \end{aligned} \right\} \end{aligned}$$
(13)

Thus, the decision rule for a multiclass pattern recognition problem based on convex hulls can be represented as a hierarchical tree of basic problems of the form (12). And the root of this tree is the problem of the form \(Z=\left\langle x,\text { }{{\mathbb {R}}^{n}},\ J=[1, m], \mathrm {A}, C\left( \mathrm {A}\right) \right\rangle \).

Let’s denote by \(Z\left( {{J}^{'}} \right) \) a problem of the form (13), which is obtained from the problem Z for the set \(J^{'}\subseteq J\) such that

(14)

Let \(\left\{ J_{1}^{'},\ldots ,J_{{{k}_{1}}}^{'} \right\} \) be the family of all subsets of \({{J}^{'}}\subseteq J\) satisfying (14). Then for the problem Z of the first level, we get \(k_1\) problems of the form \(Z\left( J_i^ {'}\right) \), \(i\in [1, k_1]\), of the second level. For each second-level problem of the form \(Z\left( J_{i}^{'} \right) \), a series of next-level problems of the form \(Z\left( J_{i}^{'} \right) \left( J_{i}^{''} \right) \) will be obtained, and so on. A vertex in such a hierarchical tree becomes terminal if the subsample \(\mathrm {A}\) involved in its formulation is included in no more than in one convex hulls involved in its formulation. Thus, to construct a decision rule, you need to construct a hierarchical graph of problems of the form (12) by constructing convex hulls for obtaining subsamples located in at least two convex hulls of the generating problem. To implement such an algorithm for constructing a decision rule, it is necessary to have effective algorithms for solving the following problems.

  1. (1)

    Let a finite set \(\mathrm {A}\subseteq {{\mathbb {R}}^{n}}\) be given. You need to find all extreme points of its convex hull \(ext\left( conv\ A \right) \).

    To detect either a point x is an extreme one for a finite set \(\mathrm {A}\), you could to solve a following problem LP1 from  [20] (see also  [24]).

    Let \(a_j\) denote an element of \(\mathrm {A}\).

    $$ \min x_j:\sum _{i\in I}x_ia_i=a_j, \sum _{i\in I}x_i=1, x_i\geqslant 0\;\forall \;i\in I, $$

    where I denotes the set \(\left\{ 1, 2, \dots , n\right\} \).

    It also should be mentioned that  [20] provide an efficient algorithm to solving a problem on the detecting all extreme points of a finite set \(\mathrm {A}\) by solving a sequence of problems of the form LP1.

  2. (2)

    Let a point x and a set \(ext\left( M \right) \) of extreme points of the polyhedron M be given. You need to determine whether the point x belongs to the polyhedron M, i.e. is it true that \(x\in conv \ exe\left( M \right) \)?

    The LP 2 problem can be used to solve this problem.

    Let \(x\in \mathbb {R}^n\) and \(\mathrm {A}=\left\{ a_1, a_2, \dots , a_m\right\} \subseteq \mathbb {R}^n\), and let you need to determine either a point x will belongs to \(conv\mathrm {A}\).

    Let’s consider the following system.

    $$\begin{aligned} {\left\{ \begin{array}{ll} \sum \limits _{i=1}^m \alpha _i a_i=x,\\ \sum \limits _{i=1}^m \alpha _i=1,\\ \alpha _i\geqslant 0, i\in [1, m]. \end{array}\right. } \end{aligned}$$
    (15)

    It’s obvious that \(x\in conv\mathrm {A}\) if and only if a system above is feasible. From the other hand, such a system could be transformed into linear program LP2 of the form:

    $$\begin{aligned} {\left\{ \begin{array}{ll} v+w\;\longrightarrow \;\min ,\\ \sum \limits _{i=1}^m \alpha _i a_i=x,\\ \sum \limits _{i=1}^m \alpha _i+v-w=1,\\ \alpha _i\geqslant 0, i\in [1,m],\\ v\geqslant 0, w\geqslant 0. \end{array}\right. } \end{aligned}$$
    (16)

    where v and w are correcting variables in case a system (15) is infeasible. So, a point x will belongs to \(conv\mathrm {A}\) if and only if \(g=0\).

  3. (3)

    Let a point b and a set \(ext\left( M \right) \) of extreme points of the polyhedron M be given.

    You need to find the shortest distance from the point x to M, i.e. \(\rho \left( x, M \right) = \min \left\{ \rho \left( x, y \right) :y\in M \right\} \).

    The following quadratic programming problem can be used to solve this problem:

    $$\begin{aligned} {\left\{ \begin{array}{ll} \sum \limits _{i=1}^{n}{{{\left( {{x}_{i}}-{{b}_{i}} \right) }^{2}}}\ \rightarrow \ \min , \\ \sum \limits _{j=1}^{m}{{{\alpha }_{j}}}\cdot {{a}_{j}}=x, \\ \sum \limits _{j=1}^{m}{{{\alpha }_{i}}}=1, \\ {{\alpha }_{j}}\text { }\ 0,\text { }j\in [1,m]. \\ \end{array}\right. } \end{aligned}$$
    (17)

    Then we get that the required shortest distance from the point b to the convex hull of a finite set A in the space \(\mathbb {R}^n\) is equal to the following:

    $$ \rho \left( b,conv\ A \right) =\sqrt{\sum \limits _{i=1}^{n}{{{\left( {{x}_{i}}-{{b}_{i}} \right) }^{2}}}}. $$

2 Application of the CSS Machine Learning Algorithm

Let’s consider several applied problems, for which proposed CSS machine learning algorithm could be used. Such problems are the problem on the bank scoring  [9], analysis of financial markets  [10, 11], medical diagnostics, non-destructive control, and search for reference clients for marketing activities in social networks.

Problem 1. A Classical Problem of Irises Fisher  [16]

There is a training sample of 150 objects in the space \({{\mathbb {R}}^{n}}\), which is divided into 3 classes: class \({{A}_{1}}\)—Setosa, class \({{A}_{2}}\)—Versicolor and class \(A_3\)—Virginica, and each class contains 50 objects. It turns out that this well-known classical problem is CSS-solvable:

$$ {\left\{ \begin{array}{ll} conv\ {{A}_{1}}\cap \left( {{A}_{2}}\cup {{A}_{3}} \right) =\varnothing , \\ conv\ {{A}_{2}}\cap {{A}_{3}}=\varnothing . \\ \end{array}\right. } $$

In this case, the class(x) decision rule looks as following:

$$ class(x)= {\left\{ \begin{array}{ll} 1,\ \text {}\ x\in conv\ {{A}_{1}}, \\ 2,\ \text {}\ x\in conv\text { }{{A}_{2}}, x\not \in conv\ {{A}_{1}} \\ 3,\ \text {}\ x\in conv\ {{A}_{3}}, x\not \in conv\ {{A}_{1}}. \\ \end{array}\right. } $$
$$ arg\underset{i\in \{1,2,3\}}{\mathop {\min }}\,\rho \left( x,conv\text { }{{A}_{i}} \right) ,\ \text {}\ x\not \in conv\ {{A}_{1}}\cup conv\text { }{{A}_{2}}\cup conv\ {{A}_{3}}. $$

Problem 2

The proposed approach was used to develop a strategy for trading shares of the Bank of the Russian FederationFootnote 1\(^,\)Footnote 2. 5 stock market indicators were selected as input parameters. Table 1 below provides a description of these parameters.

Table 1. Description of features

The following object classes were required to be recognized:

  1. 1.

    Class Yes—the set of positions on the trading strategy that were closed with a profit and the profit was greater than the maximum loss for the period of holding the position.

  2. 2.

    Class No—the set of positions on the trading strategy that were closed with a loss or the profit was less than the maximum loss for the period of holding the position.

Corresponding classes were formed based on real data obtained in the period from 26.02.10 until 03.10.19.

Description of cardinality of obtained sets, as well as the number of extreme points and belonging to convex hulls, are shown in the following Table 2.

Table 2. Description of obtained results

From the table above you can conclude that a position needs to be open if and only if current status corresponds to the convex hull of the class Yes of the Level 1 or 2. And in other cases the risk is very high.

Problem 3

Convex hulls method was used for solving the problems on the bank scoring. Let’s describe the most representative examples of favorable and unfavorable cases we had meet.

Favorable Case. There are 6 input parameters, and all of them are related with financial well-being of the borrower. Data from the first stage of calculations are shown in Table 3.

Table 3. Favorable case. First stage

Further the procedure needs to be repeating for the next 9858 non-default and 242 default items. We will not explain all stages, but it should be mentioned that an acceptable solution was obtained with 7 iterations.

Unfavorable Case. There are 5 input parameters (loan amount, loan term, borrower age, loan amount-to-age ratio, loan amount-to-loan term ratio). Data from the first stage of calculations are shown in Table 4.

Table 4. Unfavorable case. First stage

In this case, the convex hulls of default and non-default sets are significantly intersected, which is due to the specifics of the problem (the share of default loans is 2.1%), as well as to small number of explanatory features. Further we plan to develop a method for solving similar problems (if one set is fully belongs to the another one and strongly blurred in it). In particular, we plan to consider a problem on the determining the balance between the percentage of points included in the convex hull and the size of this convex hull.

It is naturall that practical situations are much more complicated, but the sequence of actions described above allows you to get an efficient desicion rule.

Conclusion

The paper proposes an approach to solving multiclass pattern recognition problems in geometric formulation based on convex hulls and convex separable sets (CS-sets). Such problems often arise in the field of financial mathematics, for example, in problems of bank scoring and market analysis, as well as in various areas of diagnostics and forecasting. The main idea of the proposed approach is as follows. If for the given family of points \(\mathrm {A}\), which is separated into m classes \(A_i\) where \(i\in \left[ 1 , m \right] \), each convex hull \(conv\ A_{i}\) contains only points from class \({{A}_i}\), then we suppose that any point \(x\in conv\ A_i\) represents a vector belonging to the class \(A_i\). In the paper is introduced key definition of convex separable set (CSS) for the family of \(\mathrm {A}=\left\{ A_1,\ldots ,A_m\right\} \) subsets of \({{\mathbb {R}}^{n}}\). Based on this definition another important for this approach definition of CSS-solvable family \(\mathrm {A}=\left\{ A_1,\ldots ,A_m\right\} \) is introduced. The advantage of the proposed method is the uniqueness of the resulting solution and the uniqueness of assigning each point of the source space to one of the classes. The approach also allows you to uniqelly filter the sourse data for the outliers in the data. Computational experiments using the developed approach were carried out using academic examples and test data from public libraries. An efficiency and power of the proposed approach are demonstrated on classical Irises Fischer problem  [16] as well as on several applied ecomonical problems. It is shown that classical Irises Fischer problem  [16] is CSS-solvable. Such a fact allows you to expect a high efficiency of the proposed method from the applied point of view.