Abstract
New algorithms for the numerical solution of optimization problems involving the \(l_{0}\) pseudo-norm are proposed. They are designed to use a recently proposed computational methodology that is able to deal numerically with finite, infinite and infinitesimal numbers. This new methodology introduces an infinite unit of measure expressed by the numeral \(\textcircled {1}\) (grossone) and indicating the number of elements of the set \({\text {I}}\!{\text {N}}\), of natural numbers. We show how the numerical system built upon \(\textcircled {1}\) and the proposed approximation of the \(l_0\) pseudo-norm in terms of \(\textcircled {1}\) can be successfully used in the solution of elastic net regularization problems and sparse support vector machines classification problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Given a vector x of n components, the \(l_0\) pseudo-norm
has often been used in optimization problems arising in various fields. However, the introduction of \(\left\| {x} \right\| _0\) makes these problems extremely complicated to solve, so that approximations and iterative schemes have been proposed in the scientific literature to efficiently solve them.
The use of this pseudo-norm arises in many different fields, such as machine learning, signal processing, pattern recognition, portfolio optimization, subset selection problem in regression and elastic-net regularization. Cardinality constrained optimization problems are difficult to solve, and a common approach is to apply global discrete optimization techniques.
Quite recently Sergeyev, in a book and in a series of papers, proposed a novel approach to infinite and infinitesimal numbers. By introducing the new numeral grossone (indicated by \(\textcircled {1}\)), defined as the number of elements of the set of the natural numbers, Sergeyev demonstrated how it is possible to operate with finite, infinite and infinitesimal quantities using the same arithmetics. This new numerical system allows to treat infinite and infinitesimal numbers as particular cases of a single structure and offers a new view and an alternative approach for many fundamental aspects of mathematics such as limits, derivatives, sums of series and so on.
The aim of this paper is to show how this new numeral system and in particular \(\textcircled {1}\) can be used in different optimization problems, by replacing the \(l_0\) pseudo-norm \(\left\| {x} \right\| _0\) with
Indeed in literature, there are many contributes for approximating the \(l_0\) pseudo-norm. For example in Rinaldi et al. (2010), two new smooth approximations of the \(l_0\) pseudo-norm are presented and other approximations are recalled in the following.
The paper is organized as follows. In Sect. 2, the new numeral system is presented by describing its main properties: Infinity, Identity and Divisibility. Moreover, the new numeral positional system and the concept of gross-number are discussed. In Sect. 3, the properties of \(\left\| {x} \right\| _0\) are introduced and some approximations proposed in literature are presented. The definition of \(\left\| {x} \right\| _{0,{\textcircled {1}}^{-1}}\) is then introduced and it is shown that \(\left\| {x} \right\| _{0}\) and \(\left\| {x} \right\| _{0,{\textcircled {1}}^{-1}}\) coincide for the finite term and may differ only for infinitesimal terms. Two different applications of \(\left\| {x} \right\| _{0,\textcircled {1}^{-1}}\) are presented in Sects. 4 and 5. The first, studied in Sect. 4, concerns elastic net regularization and an algorithm for solving the optimization problem
In Sect. 5, the newly proposed definition of \(\left\| {x} \right\| _{0,\textcircled {1}^{-1}}\) is used in classification problems using sparse support vector machines (SVMs). In particular, we suggest an interpretation of an updating rule already proposed in the literature based on the KKT (Karush–Kuhn–Tucker) conditions and the expansion of gross-numbers.
We briefly describe some notations used throughout the paper. With \({\text {I}}\!{\text {N}}\), we indicate the set of natural numbers. Given \(n,m \in {\text {I}}\!{\text {N}}\), let \({\text {I}}\!{\text {R}}^n\) be the space of the n-dimensional vectors with real components and let \({\text {I}}\!{\text {R}}^{m\times n}\) be the space of matrices with real elements, m rows and n columns. All vectors are column vectors and are indicated with lower case Latin letter (e.g., x, y, z \( \in {\text {I}}\!{\text {R}}^n\)). Subscripts indicate components of a vector, while superscripts are used to identify different vectors. Matrices are indicated with upper case Roman letter (e.g., A, B \(\in {\text {I}}\!{\text {R}}^{m \times n}\)). If \(A \in {\text {I}}\!{\text {R}}^{m \times n}\), \(A^T_{i \cdot }\) is the ith row of A. The symbol \(\left\| {x} \right\| \) indicates the norm of a vector x. Specific norms or parameters of the norm are indicated with subscripts. The scalar product of two vectors x, y in \({\text {I}}\!{\text {R}}^n\) is denoted by \({x}^T{y}\), while in a generic Hilbert space we use the notation \(\left\langle x,y \right\rangle \). The symbol \(:=\) denotes definition of the term. The gradient of a continuously differentiable function \(f:{\text {I}}\!{\text {R}}^n \rightarrow {\text {I}}\!{\text {R}}\) at a point \(x\in {\text {I}}\!{\text {R}}^n\) is indicated by \(\nabla f(x)\).
2 The algebra of \(\textcircled {1}\)
The numeral system, originally proposed by Sergeyev (2001, (2009, (2017), is based on the numeral \(\textcircled {1}\) (called grossone) defined as the number of elements of the set \({\text {I}}\!{\text {N}}\). This new definition of infinite unit consents to work numerically with infinities and infinitesimals. In particular, the numerical system built upon \(\textcircled {1}\) makes possible to treat infinite and infinitesimal numbers in a unique framework, and to work with all of them numerically.
For instance, using \(\textcircled {1}\)-based numerals, it is possible to execute arithmetic operations with floating-point numbers and to assign concrete infinite and infinitesimal values to variables. Moreover, \(\textcircled {1}\) allows to compute more precisely the number of elements of infinite sets extending the traditional set theory operating with Cantor’s cardinals. For example, the set of even numbers and the set of integers that the traditional cardinalities identify both as countable, have in this new numeral system, respectively, \(\frac{\textcircled {1}}{2}\) and \(2 \textcircled {1}+1\) elements.
The new computational methodology has been successfully applied in several fields of pure and applied mathematics offering new and alternative approaches. Here, we only mention (numerical) differentiation (Sergeyev 2011a), ODE (Sergeyev et al. 2016; Amodio et al. 2017), optimization (Cococcioni et al. 2018; De Cosmis and De Leone 2012; De Leone 2018; De Leone et al. 2018; Gaudioso et al. 2018; Sergeyev et al. 2018), hyperbolic geometry (Margenstern 2012), infinite series and the Riemann zeta function (Sergeyev 2009, 2011b), biological processes, cellular automata (D’Alotto 2013). For a survey of the various aspects and applications of \(\textcircled {1}\), we refer the interested reader to Sergeyev (2001, (2010, (2016, (2017, (2019), Caldarola (2018), Calude and Dumitrescu (2020), Iudin et al. (2012), Lolli (2015), Rizza (2019) and to the references therein.
Following the procedure used in the past when the numeral 0 (zero) has been introduced to extend the natural numbers to integers, Sergeyev has introduced the new numeral \(\textcircled {1}\).
In particular, \(\textcircled {1}\) is introduced by adding the Infinite Unit Axiom postulate (IUA) to the axioms of real numbers. The IUA postulate is composed of three parts: Infinity, Identity and Divisibility:
-
1.
Infinity: any finite natural number n is less than grossone, i.e., \(n<\textcircled {1}\), \(\forall n \in {\text {I}}\!{\text {N}}\).
-
2.
Identity: the following relations link \(\textcircled {1}\) to the identity elements 0 and 1
$$\begin{aligned} 0 \cdot {\textcircled {1}}&={\textcircled {1}}\cdot 0=0, \quad {\textcircled {1}} - {\textcircled {1}}=0, \quad \frac{\textcircled {1}}{\textcircled {1}}=1,\nonumber \\&\quad {\textcircled {1}}^{0} =1, \quad 1 ^{{\textcircled {1}}} =1, \quad 0^{{\textcircled {1}}} =0. \end{aligned}$$(1) -
3.
Divisibility: for any finite natural number n the sets \({\text {I}}\!{\text {N}}_{k,n}\), \(1 \le k \le n\), called the nth parts of the set \({\text {I}}\!{\text {N}}\) of natural numbers and defined as
$$\begin{aligned} {\text {I}}\!{\text {N}}_{k,n}&= \left\{ k,k+n, k+2n,k+3n, \ldots \right\} , 1 \le k \le n,\nonumber \\&\quad \quad \displaystyle \bigcup _{k=1}^{n}{\text {I}}\!{\text {N}}_{k,n}= {\text {I}}\!{\text {N}}, \end{aligned}$$(2)have the same number of elements indicated by the numeral \(\frac{\textcircled {1}}{n}\). Note that \(\frac{\textcircled {1}}{n}\) is larger than any finite number.
Since this postulate is added to the standard axioms of real numbers, all standard properties (i.e., commutative and associative properties, distributive property of multiplication over addition, existence of inverse elements with respect to addition and multiplication, ...) also apply to \(\textcircled {1}\) and to grossone-based numerals. On the other hand, since in this framework it is possible to execute arithmetical operations with a variety of different infinities and infinitesimals, indeterminate forms as well as various kinds of divergences are not present when working with any (finite, infinite and infinitesimal) numbers of the new numerical system.
In this new numeral positional system, a gross-number (or gross-scalar) C can be represented similarly to traditional positional numeral system, but with base number \(\textcircled {1}\), that is:
where \(m,k\in {\text {I}}\!{\text {N}}\), for \(i=-k, -(k-1),\ldots ,-1,0, 1,\ldots ,m-1,m\), numerals \(c_{p_{i}}\) are floating-point numbers and exponents \(p_{i}\) are gross-numbers such that
A gross-number is called finite if \(m=k=0\), it is called infinite if \(m>0\), and it is called infinitesimal if \(m=0\), \(c_{p_0}=0\) and \(k>0\). The exponents \(p_{i}\), \(i=-k, -(k-1),\ldots ,-1,0, 1,\ldots ,m-1,m\), are called gross-powers and can be finite, infinite, and infinitesimal. In (3), all numerals \(c_{p_{i}} \ne 0\), \(i=-k, -(k-1),\ldots ,-1,0, 1,\ldots ,m-1,m\), are called gross-digits and belong to a traditional numeral system (for example floating-point numbers).
We note that in this new numeral system, the record
represents the number C. Infinitesimal numbers are represented by numerals C having only negative (finite or infinite) gross-powers. The infinitesimal number \({\textcircled {1}}^{-1}\) verifies \({\textcircled {1}}^{-1} {\textcircled {1}}={\textcircled {1}} {\textcircled {1}}^{-1}=1\). Note that all infinitesimals are not equal to zero and in particular \({\textcircled {1}}^{-1}=\frac{1}{\textcircled {1}}>0\) because it is the result of a division between two positive gross-numbers. In the following, we consider only gross-numbers having representation (3) with \(p_i\in {\mathbb {Z}}\), \(i=-k,\dots ,m-1,m\).
We conclude this section by observing that the Infinity Computer is a new kind of a supercomputer able to execute numerical computations with finite, infinite and infinitesimal numbers numerically (not symbolically) using \(\textcircled {1}\) and the new numeral system. For more details, see Sergeyev (2001) and the references therein.
3 The \(l_0\) pseudo-norm and \(\textcircled {1}\)
In many problems in optimization and numerical analysis, it is extremely important to obtain a vector with the smallest possible number of components different from zero.
In Rinaldi et al. (2010), the problem of determining a vector belonging to a polyhedral set and having the minimum number of nonzero components is studied, and two smooth approximations of the \(l_0\) pseudo-norm are proposed. The general optimization problem with cardinality constraints is considered in Burdakov et al. (2016), where a reformulation as a smooth optimization problem is proposed. In Pham Dinh and Le Thi (2014) and Gotoh et al. (2018), the cardinality-constrained optimization problem is reformulated using a DC (difference of convex functions) approach. We refer the interested reader to Gotoh et al. (2018) for additional references to optimization problems where sparsity of the solution is required.
Determining a vector having the minimum number of nonzero components can be generally obtained by adding to the original problem a further term penalizing the number of components different from zero or a term that can approximately achieve the same goal.
Let \(x\in {\text {I}}\!{\text {R}}^n\). The \(l_0\) pseudo-norm is defined as
where \(1_a\) is the characteristic (indicator) function that is equal to 1 if \(a \ne 0\) and zero otherwise. To be precise, \(\left\| {\cdot } \right\| _0\) is called \(l_0\) pseudo-norm since it is not a norm. In fact for \(x \in {\text {I}}\!{\text {R}}^n\), \(x\ne 0\) and \(0 \ne \lambda \in {\text {I}}\!{\text {R}}\), we have:
and hence, \(\left\| {\lambda x} \right\| _0 = \left| \lambda \right| \left\| {x} \right\| _0 \) if and only if \(\left| \lambda \right| = 1\).
In successive sections, we present some specific use of the \(l_0\) pseudo-norm for regularization and sparse solutions problems.
In Natarajan (1995), it is shown that computing sparse approximate solutions to linear systems is NP-hard. Moreover, in Amaldi and Kann (1998) it is shown that, for system of linear relations, the problems of determining a solution violating the minimum number of relations (when the system itself is infeasible) and determining a solution with as few nonzero variables as possible (if feasible) are both NP-hard, and various strong bounds on the approximability of different variants of these problems are discussed.
Therefore, various approximations of \( \left\| {x} \right\| _0 \) have been proposed in the literature. In the context of Feature Selection and Machine Learning, in Bradley and Mangasarian (1998) the following approximation is proposed
where \(\alpha \) is a given positive number for which the authors suggest to set the value 5.
In Li and Ye (2017), in the context of elastic net regularization (discussed in detail in Sect. 4), the authors proposed the following approximation:
where \(\delta >0 \) and smaller positive values of \(\delta \) provide a better approximation of \(\left\| {x} \right\| _0\).
Following this suggestion and by using the new numeral system, we propose to approximate the quantity \(\left\| {x} \right\| _0\) with
Let us study in detail the connections between \( \left\| {x} \right\| _0\) and \(\left\| {x} \right\| _{0,\textcircled {1}^{-1}}\). Let \(\displaystyle \psi (t):=\frac{t^2}{t^2 + \textcircled {1}^{-1} } \). Hence, \(\displaystyle \left\| {x} \right\| _{0,\textcircled {1}^{-1}} = \sum _{i=1}^{n} \psi (x_i) \). For \(i=1,\ldots ,n\), we assume
where \(R_{i}\) includes only finite and infinitesimal terms.
When \(x_i^{(0)} = 0 \)
where \(R_i'\) includes only finite and infinitesimal terms. Instead, when \(x_i^{(0)} \ne 0 \)
where again \(S_i'\) includes only finite and infinitesimal terms.
Therefore,
for some gross-number T which includes only finite and infinitesimal terms. Hence, the finite parts of \( \left\| {x} \right\| _0 \) and \(\left\| {x} \right\| _{0,\textcircled {1}^{-1}} \) coincide.
4 Elastic net regularization and \(\textcircled {1}\)
Given a matrix \(A \in {\text {I}}\!{\text {R}}^{m\times n}\) and a vector \(b \in {\text {I}}\!{\text {R}}^m\), in many important applications it is essential to determine a solution \(x\in {\text {I}}\!{\text {R}}^n\) of the system of linear equations \(Ax = b\) with the smallest number of nonzero components:
The associated generalized elastic net regularization problem (Li and Ye 2017) is
where \(\lambda _0 > 0\) and \(\lambda _2 > 0\) are regularization parameters. In Li and Ye (2017), the authors suggest to substitute \(\left\| {x} \right\| _0\) with \(\left\| {x} \right\| _{0,\delta }\), as defined in (7), for fixed positive \(\delta \), and a convergent algorithm for the solution of the corresponding optimization problem is proposed. Clearly, the obtained solution only approximates the optimal solution of (10).
We propose to use in (10) the approximation (8) of \(\left\| {x} \right\| _0\). Replicating some of the proofs in Li and Ye (2017), a convergent algorithm can be constructed. Moreover, due to the position (9), apart from terms of order \(\textcircled {1}^{-1}\) or below, the obtained solution solves also problem (10).
In detail, consider the problem
where
Let \(D(x) \in {\text {I}}\!{\text {R}}^{n\times n}\) be given by
Then,
Following (Li and Ye 2017), the iterative scheme we propose is given in Algorithm 4.1, moreover Lemma 1 is the basis for establishing the convergence result.
Lemma 1
Let f be given by (11). Let \(x^{k+1}\) be obtained from \(x^{k}\) by solving the system of Eq. (13). Then
and hence
Proof
See “Appendix A.” \(\square \)
The lemma above shows that the sequence \(\{f(x^k)\}\) is a non-increasing sequence. We are now ready to state the following convergence theorem.
Theorem 1
Let \(\mathcal {L}_0 :=\{x: f(x) \le f(x^0) \}\) be a compact set, and let \(\{x^k\}\) be the sequence produced by the iterative scheme (13). Then
-
1.
the sequence \(\{x^k\}\) is all contained in \(\mathcal {L}_0\);
-
2.
the sequence \(\left\{ x^{k}\right\} \) has at least one accumulation point;
-
3.
each accumulation point of \(\{x^k\}\) belongs to \(\mathcal {L}_0\);
-
4.
each accumulation point \({x}^{*}\) satisfies the condition
$$\begin{aligned} \Biggl (A^T A + \lambda _2 I + \lambda _0 D({x}^{*}) \Biggr ) {x}^{*}= A^T b , \end{aligned}$$and hence is a stationary point of f.
Proof
First, from condition (14) in Lemma 1 we have \(f\left( x^{k+1}\right) \le f\left( x^k\right) \), and hence, the entire sequence \(\{x^k\}\) is contained in \(\mathcal {L}_0\). Moreover, the sequence \( \left\{ f\left( x^k\right) \right\} \) is a bounded non-increasing sequence, and hence, it is a convergent sequence. The existence of accumulation points for the subsequence follows from the compactness of \(\mathcal {L}_0\). Let now \({x}^{*}\) be an accumulation point of \(\{x^k\}\) and \(\left\{ x^{k_l}\right\} \) be a subsequence indexed by l converging to \({x}^{*}\). From (16), it follows that
The right term converges to 0, and also the subsequence \(\left\{ x^{k_{l+1}}\right\} \) converges to the accumulation point \({x}^{*}\). Moreover,
and hence,
Therefore, \({x}^{*}\) is a stationary point of f. \(\square \)
5 Sparse support vector machine
In this section, we show how \(\textcircled {1}\) and the results of Sect. 3 can be used in the context of sparse support vector machines (SSVMs).
Assume that empirical data (training set) \((x^i, y_i), \) \(i = 1, \ldots , {l},\) are given, where \(x^i \in {\text {I}}\!{\text {R}}^n,\) and \(y_{i} \in \{-1,1\}\), \(i = 1, \ldots , {l}\). Note that when the index i is used as a superscript the corresponding object is an input, while when it is used as a subscript the corresponding object is an output. The aim is to determine an hyperplane (and hence a vector \(w\in {\text {I}}\!{\text {R}}^n\) and a scalar \(\theta \)) such that:
The classification function is
Classification in the feature space (instead of the original space) requires to introduce a map
where \(\mathcal {E}\) is an Hilbert Space with scalar product \(\left\langle \cdot ,\cdot \right\rangle \). In classical SVM [see Cortes and Vapnik (1995), Cristianini and Shawe-Taylor (2000), Smola and Schölkopf (2004) and references therein], the construction of the optimal hyperplane requires to solve the following (primal) optimization problem
where e is a vector with all elements equal to 1 and C is a positive scalar.
The corresponding dual problem is
where
The function
is called the kernel function. It is well known that for the construction of the dual problem and the classification function the complete knowledge of the function \(\phi (\cdot ) \) is not necessary: only the quantities \(K_{ij}=\left\langle \phi (x^i),\phi (x^j) \right\rangle \) are needed. In fact, from KKT conditions
and the classification function is
For a sparse representation of SVM, the vector w is substituted by its expansion in terms of the vector \(\alpha \). Moreover, let \(K_{i.}\) be the column vector that corresponds to the ith row of matrix \((K_{ij})\), note that
In Huang et al. (2010), the authors consider the following optimization problem instead of (17), obtained by replacing \(\frac{1}{2}\left\langle w,w \right\rangle \) with \(\left\| {\alpha } \right\| _0\) and by using the expansion (19) of w in terms of \(\alpha \):
In problem (20) the term \(\left\| {\alpha } \right\| _0\) is then replaced by \(\frac{1}{2}{\alpha }^T{\Lambda \alpha }\), where \(\Lambda \) is the diagonal matrix with \(\Lambda _{ii} = \lambda _i\), \(i = 1, \ldots , {l}\).
The following iterative scheme was proposed in Huang et al. (2010) to solve the above problem. In particular, given a very small positive value \(\epsilon \), in the Algorithm 5.1 the new value \(\lambda ^{k+1}_r\) for \(\lambda _r\) is set to \(1/\left( \alpha ^k_r\right) ^2\) if \(\left| \alpha ^k_r \right| \) is “significantly” different from zero, otherwise, \(\lambda ^{k+1}_r = 1/\epsilon ^2\).
The KKT conditions for Problem (21) are
where \(\beta \) is the vector of multipliers associated to the constraints \(y_i \Bigl [{K_{i.}}^T{\alpha }+ \theta \Bigr ] \ge 1 - \xi _i,\;\;\; i = 1, \ldots , {l}\).
From (22a) it follows that
where \(\bar{K}_{rj} = y_j K_{jr}\), with \(r,j = 1, \ldots , {l}\).
Once again, instead of \(\left\| {\alpha } \right\| _0\), we use \(\left\| {\alpha } \right\| _{0,\textcircled {1}^{-1}}\) and we propose to solve the following \(\textcircled {1}\)–Sparse SVM problem in place of (20):
Let by (8)
Then,
The KKT conditions for the above problem (23) are
Note that conditions (24a) can be rewritten as
From the above formula, it is “natural” to set the new value for \(\lambda _r\)
Now, for \( r = 1, \ldots , {l}\), let
with \(A\in {\text {I}}\!{\text {R}}\) finite or infinitesimal. When \(\alpha _r^{(0)} = 0\)
and
with \(A'\) finite or infinitesimal. In place, when \(\alpha _r^{(0)} \ne 0\)
and
with \(A'',\ A'''\) finite or infinitesimal.
Formulas (26) and (27) mimic almost perfectly the updating formulas for \(\lambda ^{k+1}\) proposed in Huang et al. (2010) in order to solve (20) by using problem (21). The main difference is that, for the nonzero case there is a power 4 instead of 2. However, using \(\textcircled {1}\) and considering problem (23) we can shed some light on the updating formulas for \(\beta \), otherwise quite arbitrary. The term \(\frac{1}{\textcircled {1}^{-2}}\) perfectly corresponds to the updating formula \(\lambda ^{k+1}_r = \frac{1}{\epsilon ^2} \). In the other case, the updating formula \(\lambda ^{k+1}_r = \frac{1}{\bar{\alpha }_r^2}\) is replaced by the term \(\frac{1}{\left( \alpha _r^{(0)}\right) ^4 }\).
The use of \( \textcircled {1}\) allows to easily obtain both formulas from the KKT condition (24a) and observations on the expansion of the gross-number \(\alpha \).
6 Conclusions
In this paper, we have presented some possible uses in optimization problems of the novel approach to infinite and infinitesimal numbers proposed by Sergeyev (2001, (2009, (2017). In particular, optimization problems including smoothed \(l_0\) penalty and classification problems involving sparse support vector machines are studied. In order to avoid the difficulties due to the use of the \(l_0\) pseudo-norm of a vector, we propose to approximate the \(l_0\) pseudo-norm by using a smooth function defined in terms of \(\textcircled {1}\). The results obtained using this new approximation in the two optimization problems perfectly match with those presented in the literature. Actually, the new \(\textcircled {1}\)-based methodology may represent a fruitful and promising tool to be exploited within other classification and regression problems offering new views and different perspectives.
References
Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoret Comput Sci 209(1–2):237–260
Amodio P, Iavernaro F, Mazzia F, Mukhametzhanov MS, Sergeyev YD (2017) A generalized Taylor method of order three for the solution of initial value problems in standard and infinity floating-point arithmetic. Math Comput Simul 141:24–39
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the fifteenth international conference on machine learning, ICML’98. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 82–90
Burdakov O, Kanzow C, Schwartz A (2016) Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method. SIAM J Optim 26(1):397–425
Caldarola F (2018) The Sierpinski curve viewed by numerical computations with infinities and infinitesimals. Appl Math Comput 318:321–328
Calude CS, Dumitrescu M (2020) Infinitesimal probabilities based on grossone. SN Comput Sci 1(1):36
Cococcioni M, Pappalardo M, Sergeyev YD (2018) Lexicographic multi-objective linear programming using grossone methodology: theory and algorithm. Appl Math Comput 318:298–311
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
D’Alotto L (2013) A classification of two-dimensional cellular automata using infinite computations. Indian J Math 55:143–158
De Cosmis S, De Leone R (2012) The use of grossone in mathematical programming and operations research. Appl Math Comput 218(16):8029–8038
De Leone R (2018) Nonlinear programming and grossone. Appl Math Comput 318(C):290–297
De Leone R, Fasano G, Sergeyev YD (2018) Planar methods and grossone for the conjugate gradient breakdown in nonlinear programming. Comput Optim Appl 71(1):73–93
Gaudioso M, Giallombardo G, Mukhametzhanov M (2018) Numerical infinitesimals in a variable metric method for convex nonsmooth optimization. Appl Math Comput 318:312–320
Gotoh J, Takeda A, Tono K (2018) DC formulations and algorithms for sparse optimization problems. Math Program 169:141–176
Huang K, Zheng D, Sun J, Hotta Y, Fujimoto K, Naoi S (2010) Sparse learning for support vector classification. Pattern Recogn Lett 31(13):1944–1951
Iudin D, Sergeyev YD, Hayakawa M (2012) Interpretation of percolation in terms of infinity computations. Appl Math Comput 218(16):8099–8111
Li S, Ye W (2017) A generalized elastic net regularization with smoothed \(l_0\) penalty. Adv Pure Math 7:66–74
Lolli G (2015) Metamathematical investigations on the theory of grossone. Appl Math Comput 255:3–14
Margenstern M (2012) An application of grossone to the study of a family of tilings of the hyperbolic plane. Appl Math Comput 218(16):8005–8018
Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24(2):227–234
Pham DT, Le Thi HA (2014) Recent advances in DC programming and DCA. Springer, Berlin, pp 1–37
Rinaldi F, Schoen F, Sciandrone M (2010) Concave programming for minimizing the zero-norm over polyhedral sets. Comput Optim Appl 46:467–486
Rizza D (2019) Numerical methods for infinite decision-making processes. Int J Unconv Comput 14(2):139–158
Sergeyev YD (2001) Arithmetic of infinity. Edizioni Orizzonti Meridionali, Cosenza
Sergeyev YD (2009) Numerical point of view on Calculus for functions assuming finite, infinite, and infinitesimal values over finite, infinite, and infinitesimal domains. Nonlinear Anal Ser A Theory Methods Appl 71(12):e1688–e1707
Sergeyev YD (2010) Lagrange lecture: methodology of numerical computations with infinities and infinitesimals. Rendiconti del Seminario Matematico dell’Università e del Politecnico di Torino 68(2):95–113
Sergeyev YD (2011a) Higher order numerical differentiation on the infinity computer. Optim Lett 5(4):575–585
Sergeyev YD (2011b) On accuracy of mathematical languages used to deal with the Riemann zeta function and the Dirichlet eta function. p-Adic Numbers Ultrametr Anal Appl 3(2):129–148
Sergeyev YD (2016) The exact (up to infinitesimals) infinite perimeter of the Koch snowflake and its finite area. Commun Nonlinear Sci Numer Simul 31(1–3):21–29
Sergeyev YD (2017) Numerical infinities and infinitesimals: methodology, applications, and repercussions on two Hilbert problems. EMS Sur Math Sci 4:219–320
Sergeyev YD (2019) Independence of the grossone-based infinity methodology from non-standard analysis and comments upon logical fallacies in some texts asserting the opposite. Found Sci 24(1):153–170
Sergeyev YD, Mukhametzhanov M, Mazzia F, Iavernaro F, Amodio P (2016) Numerical methods for solving initial value problems on the infinity computer. Int J Unconv Comput 12(1):3–23
Sergeyev YD, Kvasov DE, Mukhametzhanov MS (2018) On strong homogeneity of a class of global optimization algorithms working with infinite and infinitesimal scales. Commun Nonlinear Sci Numer Simul 59:319–330
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by Yaroslav D. Sergeyev.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proof of Lemma 1
Proof of Lemma 1
We prove Lemma 1 of Sect. 4 by adapting to this context the proof proposed by the authors in Li and Ye (2017), Lemma 2.
Proof of Lemma 1
Let f be defined in (11). We have:
We analyze separately the different terms of (28). We can write:
where in the last step the definitions of the proposed iterative scheme (13) and the matrix D in (12) have been used.
Moreover,
Substituting (29) and (30) into (28) we have:
From the definition (8)
It is easy to prove (Li and Ye 2017), Lemma 1 that given \(\delta >0\), for any \(a, b \in {\text {I}}\!{\text {R}}\) the following inequality holds:
As a result, we can write
By using (32)–(34) into (31), we can conclude:
since \(\displaystyle \sum \nolimits _{i=1}^{n} \frac{\textcircled {1}^{-1} \left( x_i^{k}-x_i^{k+1} \right) ^{2} }{\left( \left( x_i^{k}\right) ^2+ \textcircled {1}^{-1}\right) ^{2}} \ge 0\) for any \(x_i^{k}\) and \(x_i^{k+1}\).
\(\square \)
Rights and permissions
About this article
Cite this article
De Leone, R., Egidi, N. & Fatone, L. The use of grossone in elastic net regularization and sparse support vector machines. Soft Comput 24, 17669–17677 (2020). https://doi.org/10.1007/s00500-020-05185-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05185-z