1 Introduction

Classification problems in mathematical programming concern separation of sample sets by means of an appropriate surface. This field, entered by many researchers in optimization community in the last years, is a part of the more general machine learning area, aimed at providing automated systems able to learn from human experiences.

In machine learning, classification can be addressed on the basis of different paradigms (Astorino et al. 2008). The most common one is the supervised approach, where the samples in each set are equipped with the class label heavily exploited in the learning phase. A well-established supervised technique is the support vector machine (SVM) (Cristianini and Shawe-Taylor 2000; Vapnik 1995), which revealed a powerful classification tool in many applicative areas. A widely adopted alternative is called unsupervised, since no class label is known in advance; the aim is to cluster the data on the basis of their similarities (Celebi 2015). In the middle, we find the semisupervised techniques (Chapelle et al. 2006), which are a compromise between the supervised and the unsupervised approaches; in such case, the learning task is characterized by the exploitation of the overall information coming from both labeled and unlabeled samples. Some useful references are Chapelle and Zien (2005) and Astorino and Fuduli (2007), the latter being a semisupervised version of the SVM technique.

A more recent classification framework is constituted by the multiple instance learning (MIL) (Herrera et al. 2016), which can be interpreted as a kind of weak supervised approach; it consists in categorizing bags of samples, being available only the class label of the bags instead of the class label of each sample inside them. A seminal SVM-type MIL paper is Andrews et al. (2003), while some recent articles are Astorino et al. (2018, (2019a, (2019b, (2020), Avolio and Fuduli (2020), Gaudioso et al. (2020), and Plastria et al. (2014).

In this work, we present an extension of the supervised binary classification approach reported in Astorino and Gaudioso (2009) and based on the spherical separation of two finite sets of samples (points in \(\mathrm{{I\!R}}^n\)), say

$$\begin{aligned} {\mathcal {A}}=\{a_1,\ldots ,a_m\}, \text{ with } a_i\in \mathrm{{I\!R}}^n,~i=1,\ldots ,m \end{aligned}$$

and

$$\begin{aligned} {\mathcal {B}}=\{b_1,\ldots ,b_k\}, \text{ with } b_l\in \mathrm{{I\!R}}^n,~l=1,\ldots ,k. \end{aligned}$$

As initially proposed in Tax and Duin (1999) and also in Astorino and Gaudioso (2009), the objective is to find a minimal volume sphere separating the two sets \({\mathcal {A}}\) and \({\mathcal {B}}\), i.e., a sphere enclosing all points of \({\mathcal {A}}\) and no points of \({\mathcal {B}}\). In particular, while in Tax and Duin (1999) the optimization problem to be solved is characterized by \(n+1\) unknowns, the center of the sphere in \(\mathrm{{I\!R}}^n\) and the radius in \(\mathrm{{I\!R}}\), in Astorino and Gaudioso (2009) the center is prefixed; consequently, in such case, the problem becomes univariate since it reduces to just the computation of one variable, the radius. For this problem, the authors have designed a computationally fast algorithm which, despite the drastic introduced simplification, provides reasonably good separation results, as long as the center is judiciously chosen. Possible choices for the center are, for example, the barycenter of one of the two sets, \({\mathcal {A}}\) or \({\mathcal {B}}\), or the barycenter of the set \({\mathcal {A}} \cup {\mathcal {B}}\) when there is no information on the geometry of the data.

In this paper, we use the same approach as in Astorino and Gaudioso (2009), computing the optimal radius of the separation sphere when the center is prefixed. The novelty of our proposal consists in selecting the center infinitely far from the two sets, exploiting the new numeral system based on the grossone theory (Sergeyev 2017).

Spherical separation falls into the class of the nonlinear separation surfaces (Astorino et al. 2008, 2016), differently, for example, from the well-known SVM technique (Cristianini and Shawe-Taylor 2000; Vapnik 1995), where a classifier is constructed by generating a hyperplane far away from the points of the two sets. Also the SVM approach allows to obtain general nonlinear classifiers by adopting kernel transformations. In this case, the basic idea is to map the data into a higher-dimensional space (the feature space) and to separate the two transformed sets by means of one hyperplane, that corresponds to a nonlinear surface in the original input space. The main advantage of spherical separation is that once the center of the sphere is heuristically fixed in advance, the optimal radius can be found quite effectively by means of a simple sorting algorithm such as in Astorino and Gaudioso (2009) and Astorino et al. (2012b). No analogous simplification strategy is apparently available if one adopts the SVM approach. Moreover, another advantage is to work directly in the input space. In fact, to keep, whenever possible, the data in the original space seems appealing in order to stay close to the real-life modeled processes. Of course, kernel methods are characterized by high flexibility, even if sometimes they provide results which are hard to be interpreted in the original input space, differently from the nonlinear classifiers acting directly in such space (see, e.g., Astorino et al. (2014b, (2014c)).

The paper is organized in the following way. In the next section, we recall the main concepts related to linear and spherical separabilities; in fact, the linear separation surface (i.e., a hyperplane) can be interpreted as a sphere characterized by an infinitely far center and, consequently, by an infinite length radius. In Sect. 3, we summarize the main concepts of the grossone algebra, and in Sect. 4, we describe our approach, equipped with some numerical results obtained on a set of benchmark test problems drawn for the literature. Finally, some conclusions are reported in the last section.

Throughout the paper, we use the following notation. We indicate by \(\Vert \cdot \Vert \) the Euclidean norm, and given a set \({\mathcal {X}}\), we denote by \(\mathrm{conv}({\mathcal {X}})\) the convex hull of \({\mathcal {X}}\).

2 Linear and spherical separation

A seminal paper on linear separation appeared in 1965 by Mangasarian (1965), while the first approach for pattern classification based on a minimum volume sphere dates back to 1999 by Tax and Duin (1999).

2.1 Linear separability

Two sets \({\mathcal {A}}\) and \({\mathcal {B}}\) are linearly separable if and only if there exists a hyperplane

$$\begin{aligned} H(w,\gamma )=\{x\in \mathrm{{I\!R}}^n \;|\;w^Tx=\gamma \}, \text{ with } w\in \mathrm{{I\!R}}^n \text{ and } \gamma \in \mathrm{{I\!R}}, \end{aligned}$$

such that

$$\begin{aligned} w^Ta_i\le \gamma -1 \quad \quad i=1,...,m \end{aligned}$$

and

$$\begin{aligned} w^Tb_l\ge \gamma +1 \quad \quad \,l=1,...,k. \end{aligned}$$

A characterization of linear separability is given by the following condition:

$$\begin{aligned} \mathrm{conv}({\mathcal {A}}) \cap \mathrm{conv}({\mathcal {B}})=\emptyset , \end{aligned}$$

which is well depicted in Fig. 1, where the two cases of linearly separable and inseparable sets are considered, respectively.

Fig. 1
figure 1

Linear separability: (a) separable sets; (b) inseparable sets

Fig. 2
figure 2

Spherical separability: (a) separable sets; (b), (c) inseparable sets

In linear separability, a relevant role is played by the SVM technique (Cristianini and Shawe-Taylor 2000; Vapnik 1995), which provides a classifier characterized by a good generalization capability, i.e., the ability to correctly classify a new sample point. This approach consists in constructing a separation hyperplane far away from the points of both the two sets \({\mathcal {A}}\) and \({\mathcal {B}}\), by minimizing the following error function:

$$\begin{aligned}&\displaystyle \min _{w,\gamma }\frac{1}{2}\Vert w\Vert ^2+C \displaystyle \sum _{i=1}^m\max \{0,a_i^Tw-\gamma +1\} \\&\quad + C \displaystyle \sum _{l=1}^k\max \{0,-b_l^Tw+\gamma +1\}, \end{aligned}$$

where the minimization of first term corresponds to the maximization of the margin (i.e., the distance between two parallel hyperplanes supporting the sets), while the last two terms represent the misclassification errors in correspondence to the two point sets \({\mathcal {A}}\) and \({\mathcal {B}}\), respectively. The parameter C is a positive constant giving the trade-off between these two objectives.

2.2 Spherical separability

In the spherical separation case, the pursued idea is to find a sphere

$$\begin{aligned} S(x_0,R)=\{x\in \mathrm{{I\!R}}^n \;|\quad \Vert x-x_0\Vert ^2 = R^2\}, \end{aligned}$$

with center \(x_0\in \mathrm{{I\!R}}^n\) and radius R, enclosing all points of \({\mathcal {A}}\) and no points of \({\mathcal {B}}\). More formally, the two sets \({\mathcal {A}}\) and \({\mathcal {B}}\) are spherically separated by \(S(x_0,R)\) if and only if

$$\begin{aligned} \Vert a_i-x_0\Vert ^2\le R^2 \quad \quad i=1,...,m \end{aligned}$$

and

$$\begin{aligned} \Vert b_l-x_0\Vert ^2\ge R^2 \quad \quad l=1,...,k. \end{aligned}$$

We observe that in this case, the role played by the two sets is not symmetric; in fact, a necessary (but not sufficient) condition for the existence of a separation sphere is the following (Fig. 2):

$$\begin{aligned} \mathrm{conv}({\mathcal {A}})\cap {\mathcal {B}} = \emptyset . \end{aligned}$$

Based on the above spherical separability definition, the classification error associated with any sphere \(S(x_0,R)\) is

$$\begin{aligned} \displaystyle \sum _{i=1}^m\max \{0,\Vert a_i-x_0\Vert ^2-R^2\}+\displaystyle \sum _{l=1}^k\max \{0, R^2-\Vert b_l-x_0\Vert ^2\}. \end{aligned}$$

To take into account the generalization capability, in Astorino and Gaudioso (2009) the authors proposed to construct a minimal volume separation sphere by solving the following problem:

$$\begin{aligned}&\displaystyle \min _{x_0,z} z+C\displaystyle \sum _{i=1}^m\max \{0,\Vert a_i-x_0\Vert ^2-z\} \nonumber \\&\quad + C\displaystyle \sum _{l=1}^k\max \{0, z-\Vert b_l-x_0\Vert ^2\}, \end{aligned}$$
(1)

with \(z{\mathop {=}\limits ^{\triangle }}R^2 \ge 0\) and \(C > 0\) being the parameter tuning the trade-off between the minimization of the volume and the minimization of the classification error.

Some works devoted to spherical separation are Astorino and Gaudioso (2009), Astorino et al. (2010, (2012a, (2012b, (2014a, (2017), and Le Thi et al. (2013). In particular, the approach we propose in this paper is based on the fixed-center algorithm introduced in Astorino and Gaudioso (2009), where the center \(x_0\) of the sphere is assumed to be fixed (e.g., equal to the barycenter of \({\mathcal {A}}\)). Indicating by p the cardinality of the biggest set between \({\mathcal {A}}\) and \({\mathcal {B}}\), it is easy to see that whenever \(x_0\) is fixed, problem (1) reduces to a univariate, convex, nonsmooth optimization problem, and it is rewritable as a structured linear program, whose dual can be solved in time \(O(p \log p)\). In fact, the optimal value of the variable z (the square of the radius) is computable by simply comparing the distances, preliminarily sorted, between the center \(x_0\) and each point in the two sets. For further technical details on such approach, we refer the reader directly to Astorino and Gaudioso (2009).

2.3 Spherical separation versus linear separation

From the mathematical point of view, both the approaches (linear and spherical separations) are characterized by the same number of parameters to be computed; in fact, a separation hyperplane is identified by the bias and the normal, while a sphere is computed by determining the center and the radius. In this perspective, a hyperplane can be viewed as a particular sphere where the center is infinitely far (Fig. 3).

Fig. 3
figure 3

Linear separability versus spherical separability

A possible choice of the center \(x_0\) is to take a point far from both the sets \({\mathcal {A}}\) and \({\mathcal {B}}\), i.e.,

$$\begin{aligned} x_0=x_0^{{\mathcal {A}}}+M \left( x_0^{{\mathcal {A}}}-x_0^{{\mathcal {B}}}\right) , \end{aligned}$$
(2)

where

$$\begin{aligned} x_0^{{\mathcal {A}}}{\mathop {=}\limits ^{\triangle }}\displaystyle \frac{1}{m}\sum _{i=1}^m a_i \quad \text{ and } \quad x_0^{{\mathcal {B}}}{\mathop {=}\limits ^{\triangle }}\displaystyle \frac{1}{k}\sum _{l=1}^k b_l \end{aligned}$$

are the barycenters of \({\mathcal {A}}\) and \({\mathcal {B}}\), respectively, while M is a sufficiently large positive parameter, commonly named “big M”.

Formula (2) corresponds to computing \(x_0\) from \(x_0^{{\mathcal {A}}}\) along the direction \(x_0^{{\mathcal {A}}}-x_0^{{\mathcal {B}}}\) with stepsize equal to M (Fig. 4).

Fig. 4
figure 4

Spherical separability with a far center

Notice that in general, the “big M” constant is not easy to be managed from the numerical point of view, since indeed it is not evident how to quantify the minimum threshold value such that M could be considered sufficiently big; as a consequence, in the practical cases, the necessity to test many trial values arises. A possible way to overcome this numerical difficulty is to obtain an infinitely far center by exploiting the new grossone theory and just setting M equal to \({\textcircled {1}}\). The symbol \({\textcircled {1}}\) denotes the new numeral, called grossone, which is the object of the next section. Differently from Astorino and Gaudioso (2009), where various values of M in formula (2) have been tested in order to obtain a good classification performance, a remarkable advantage in using the grossone resides in avoiding the necessity to repeat several tests with larger and larger values of M.

3 Generality on the grossone algebra

Grossone, denoted by the symbol \({\textcircled {1}}\), has been introduced as a basic element of a new numeral system thanks to which it is possible to express not only finite but also infinite and infinitesimal quantities and to execute numerical computations with all of them, in a unique framework with finite quantities. An explicative recent survey is Sergeyev (2017).

We remark that this new computational methodology is not related to the nonstandard analysis (Sergeyev 2019), and it is noncontradictory as studied in depth in Lolli (2015), Margenstern (2011), Montagna et al. (2015). Moreover, a new supercomputer based on grossone was conceived; it is called Infinity Computer and is patented in several countries (Sergeyev 2010a).

In the literature, there are a lot of applications of grossone in various fields: in optimization (Cococcioni et al. 2020; De Cosmis and De Leone 2012; De Leone 2018; De Leone et al. 2020, 2018; Gaudioso et al. 2018; Sergeyev et al. 2018), in numerical differentiation (Sergeyev 2011), in ordinary differential equations (Iavernaro et al. 2020) and in many other theoretical and computational research areas such as infinite series (Caldarola 2018; Sergeyev 2009, 2017, 2018; Zhigljavsky 2012). On the other hand, to the best of our knowledge, it seems that there is still no paper involving the grossone theory in classification problems.

Grossone is an infinite unit, defined as the number of the elements in the set \(\mathbb {N}\) of the natural numbers. The new numeral \({\textcircled {1}}\) is introduced by describing its properties postulated in the infinite unit axiom:

  1. 1.

    Infinity: Any finite natural number n is less than \({\textcircled {1}}\).

  2. 2.

    Identity: \(0\cdot {\textcircled {1}}={\textcircled {1}}\cdot 0=0,\; {\textcircled {1}}-{\textcircled {1}}=0,\; \displaystyle \frac{{\textcircled {1}}}{{\textcircled {1}}}=1, \; {\textcircled {1}}^0=1,\;1^{{\textcircled {1}}}=1, 0^{{\textcircled {1}}}=0.\)

  3. 3.

    Divisibility: For any finite natural number n, the sets

    $$\begin{aligned} \mathbb {N}_{k,n}=\{k, k+n, k+2n, k+3n, \ldots \}, \; 1\le k\le n,\; \displaystyle \bigcup _{k=1}^n \mathbb {N}_{k,n}=\mathbb {N} \end{aligned}$$

    have a number of elements indicated by \(\displaystyle \frac{{\textcircled {1}}}{n}\).

In particular, this axiom states that for any given finite integer n, the infinite number \(\frac{{\textcircled {1}}}{n}\) is integer being larger than any finite number. In addition to the standard axioms of the real numbers, it allows to maintain all the basic properties such as commutative or associative.

A general way to express infinite and infinitesimal numbers on a computer is provided in Sergeyev (2003, (2010b, (2015, (2017) by using a numeral positional system with the infinite base \({\textcircled {1}}\). A number Q in this new numeral system can be represented by groups of powers of \({\textcircled {1}}\):

$$\begin{aligned} Q= & {} q_{p_h}{\textcircled {1}}^{p_h} + ... + q_{p_1}{\textcircled {1}}^{p_1} + q_{p_0}{\textcircled {1}}^{p_0} \nonumber \\&\quad + q_{p_{-1}}{\textcircled {1}}^{p_{-1}} + ... + q_{p_{-r}}{\textcircled {1}}^{p_{-r}}, \end{aligned}$$
(3)

where

  • h and r are integer numbers (\(\in \mathbb {N}\));

  • the exponents \( p_i\) (\( i=-r,...,-1,0,1,...,h\)), called gross-powers, can be in turn numbers of the same type as Q and they are sorted in the decreasing order

    $$\begin{aligned} p_h>p_{h-1}>\ldots>p_1>p_0>p_{-1}>\ldots>p_{-(r-1)}>p_{-r}, \end{aligned}$$

    with \({p_0} = 0\);

  • \({q_p}_{_i}\ne 0\) (\( i=-r,...,-1,0,1,...,h\)), called gross-digits, are finite, positive or negative numbers.

Some explicative examples of number representations in this numeral system are the following.

  • Finite numbers are represented by numerals with the highest gross-power equal to zero, e.g., \(-7.3=-7.3{\textcircled {1}}^0\).

  • Infinitesimal numbers are represented by numerals having negative finite or infinite gross-powers. The simplest infinitesimal is \({\textcircled {1}}^{-1}=\frac{1}{{\textcircled {1}}}\) for which \({\textcircled {1}}^{-1} \cdot {\textcircled {1}} = 1\). Note that all infinitesimals are not equal to zero. In particular, \({\textcircled {1}}^{-1} >0\) because it is a result of division of two positive numbers.

  • Infinite numbers have at least one positive finite or infinite gross-power. For instance, the number \(23.65{\textcircled {1}}^{41.72{\textcircled {1}}}+45.13{\textcircled {1}}^{30.6}-12.27{\textcircled {1}}^{-22.1}\) is infinite; it consists of two infinite parts and one infinitesimal part.

We conclude remarking that, by using the grossone numeral system, it is easy to manage all types of computations, since it is allowed to assign infinite and infinitesimal values to quantities.

4 Computational experiments

We have tested the fixed-center spherical separation algorithm described in Astorino and Gaudioso (2009) for solving problem (1), by choosing the center of the sphere as follows:

$$\begin{aligned} x_0=x_0^{{\mathcal {A}}}+{\textcircled {1}} \left( x_0^{{\mathcal {A}}}-x_0^{{\mathcal {B}}}\right) , \end{aligned}$$

i.e., by setting \(M = {\textcircled {1}}\) in formula (2).

The code, named FC\(_{{\textcircled {1}}}\) (fixed center—infinitely far), has been implemented in MATLAB (version R2017b), and it has been run on a Windows 10 system, characterized by a 2.21 GHz processor and 16 GB of RAM. It has been tested on 13 data sets drawn from the literature and listed in Table 1.

Table 1 Data sets

The first ten test problems are taken from the UCI machine learning repository (Murphy and Aha 1992), a collection of databases, domain theories, and data generators that are used by the machine learning community. Galaxy is the data set used in galaxy discrimination with neural networks (Odewahn et al. 1992), while a detailed description of g50c and g10n is reported in Chapelle and Zien (2005).

For managing the grossone arithmetic operations, we have used the MATLAB environment of the new Simulink-based solution of the Infinity Computer (Falcone et al. 2020), where an arithmetic C++ library is integrated within a MATLAB environment. In particular, given the two gross-numbers x and y, from such library we have used the following C++ subroutines:

  • TestGrossMatrix(x,y,’-’), returning the difference between x and y;

  • TestGrossMatrix(x,y,’+’), returning the sum of x and y;

  • TestGrossMatrix(x,y,’*’), returning the product of x and y;

  • GROSS_cmp(x,y), returning 1 if \(x>y\), -1 if \(x<y\) and 0 if \(x=y\).

Using the MATLAB notation, we have expressed any vector g of n gross-number elements (that in the sequel, for the sake of simplicity, we call gross-vector) as a couple (G,fg), with

$$\begin{aligned} \mathtt{G = [g1; g2;...;gn]} \quad \text{ and } \quad \mathtt{fg = [fg1 \; fg2...fgn]}, \end{aligned}$$

where gj, \(j=1,\ldots ,n\), is an array of dimension \(s\times 2\) representing a gross-number Q, with \(s=h+r+1\) (see formula (3)). For each row of gj, the first element contains a gross-digit, while the second one contains the corresponding gross-power. Since s depends on r and h, which can be different for each array gj of the same gross-vector g, the scalar fgj, \(j=1,\ldots ,n\), is necessary to provide the position in G of the last component of gj.

To manage the gross-vectors, we have also implemented the following new MATLAB subroutines:

  • realToGrossone(r), returning a grossone representation (G,fg) of a real vector r;

  • extract(G,fg,i), returning the ith gross-number in the gross-vector (G, fg);

  • normGrossone(G,fg), computing the squared Euclidean norm of the gross-vector (G,fg);

  • scalProdG(G1,fg1,G2,fg2), computing the scalar product between the two gross-vectors (G1,fg1) and (G2,fg2);

  • BubbleSortGrossone(G,fg,sign), sorting the gross-vector (G, fg) in the ascending order if sign = 1 and in the descending order if sign = -1.

For each data set, in order to compute the best value of the parameter C, we have adopted a bilevel cross-validation strategy (Astorino and Fuduli 2016), by varying of C in the grid \(\{10^{-1},10^0,10^1,10^2\}\); such choice of the grid has been suggested by the necessity to obtain a nonzero optimal value of z, which in turn provides the optimal value of the radius R, as shown in Astorino and Gaudioso (2009) where the authors have proved the following proposition.

Proposition 1

The following statements hold:

  1. (i)

    if \(C<1/m\), then \(z^*=0;\)

  2. (ii)

    if \(C >1/m\), then \(z^*>0,\)

where \(z^*\) is any optimal solution to problem (1), when \(x_0\) is fixed.

In Table 2, we report the results provided by Algorithm FC\(_{{\textcircled {1}}}\) and expressed in terms of average testing correctness. We compare them with those ones relative to the two following fixed-center classical variants, obtained by setting

$$\begin{aligned} x_0=x_0^{{\mathcal {A}}} \quad \quad \text{(Algorithm } \text{ FC } _{{\mathcal {A}}}) \end{aligned}$$

and

$$\begin{aligned} x_0=x_0^{{\mathcal {A}}}+x_0^{{\mathcal {B}}} \quad \quad \text{(Algorithm } \text{ FC}_{{\mathcal {A}}{\mathcal {B}}}), \end{aligned}$$

respectively, and with the results obtained by a variant of the standard linear SVM (Algorithm SVM\(_0\)), where, in order to have a fair comparison, we have dropped the margin term by setting, in the fitcsvm MATLAB subroutine, the penalty parameter BoxConstraint equal to \(10^{6}\). We recall in fact that our spherical approach does not involve any margin concept. In Table 2, for each data set, the best result is underlined.

Table 2 Numerical results

In comparison with FC\(_{\mathcal {A}}\) and FC\(_{{\mathcal {A}}{\mathcal {B}}}\), the choice of the infinitely far center appears to be the best one; in fact, Algorithm FC\(_{{\textcircled {1}}}\) outperforms the other two approaches on all the data sets except Pima and Tic Tac Toe, where the best performance is got by fixing \(x_0\) as the barycenter of \({\mathcal {A}}\). We note also that choosing \(x_0\) as the barycenter of all the points is not a good strategy, since the corresponding results are very poor on all the test problems, but Cancer and Tic Tac Toe, where the testing correctnesses appear comparable.

Also with respect to SVM\(_{0}\), Algorithm FC\(_{{\textcircled {1}}}\) is characterized by a good performance, except on Diagnostic, Sonar and g10n, while on Pima both the approaches behave almost the same. These results were expected because, even if taking the radius infinitely far makes the spherical separability tend to the linear separability, the two approaches differ substantially. We recall in fact that if two sets are linearly separable, they are also spherical separable (even taking a very large radius), but the vice versa is not true.

5 Conclusions

In this paper, we have launched the idea to use the grossone theory within classification problems. In particular, we have focused our attention on the possibility to construct a binary spherical classifier characterized by an infinitely far center. As shown by the numerical results, adopting the grossone theory allows to obtain a good performance in terms of average testing correctness, managing very easily the numerical computations, which do not require any tuning of the “big M” parameter.

Future research could consist in extending such approach to the kernel trick, which is well suitable in the fixed-center spherical separation, as shown in Astorino and Gaudioso (2009), and to introduce the margin concept as in Astorino et al. (2012b).