Constructive function approximation by neural networks with optimized activation functions and fixed weights

Li, Feng-Jun

doi:10.1007/s00521-018-3573-3

Constructive function approximation by neural networks with optimized activation functions and fixed weights

S.I. : Emergence in Human-like Intelligence towards Cyber-Physical Systems
Published: 09 June 2018

Volume 31, pages 4613–4628, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Constructive function approximation by neural networks with optimized activation functions and fixed weights

Download PDF

Feng-Jun Li¹

472 Accesses
7 Citations
Explore all metrics

Abstract

Our purpose in this paper is to construct three types of single-hidden layer feed-forward neural networks (FNNs) with optimized piecewise linear activation functions and fixed weights and to present the ideal upper and lower bound estimations on the approximation accuracy of the FNNs, for continuous function defined on bounded intervals. We also prove these three types of single-hidden layer FNNs can interpolate any bounded and measurable functions. Our approach compared with existing methods does not require training. Our conclusions not only uncover the inherent properties of approximation of the FNNs, but also reveal the latent relationship among the precision of approximation, the number of hidden units and the smoothness of the target function. Finally, we demonstrate some numerical results that show good agreement with theory.

The Universal Approximation Property

Article Open access 22 January 2021

Function Approximation by Deep Neural Networks with Parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$

Article Open access 19 January 2022

Fitting Small Piece-Wise Linear Neural Network Models to Interpolate Data Sets

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Since it is wide and latent learning and expression capabilities, neural networks have been widely used in the real world. It includes almost all fields of natural science and part of social science [1,2,3,4,5,6]. It is well known that the most widely used neural networks is the feed-forward neural networks (FNNs). Many practical problems related to FNNs application, such as in pattern recognition, information processing, engineering technology, computer science, and systems control, can be converted into the ones of learning (or approximating) multivariate functions by the FNNs with optimized activation functions, for which an extensive study on approximation by FNNs has been carried out in a huge topic [7,8,9,10,11,12,13,14].

In recent years, interpolation (approximation with zero error, namely, exact approximation) by FNNs has been a hot spot of research in theory and application of FNNs and its generalization, attracting the attention of scholars all over the world [15,16,17,18,19,20,21,22].

The most widely used and studied neural networks are maybe the FNNs with one hidden layer. The fundamental element of a neural network is known as a “neuron” or a “unit.” Neurons are arranged in layers. A FNN with one hidden layer consists of three layers: input layer, hidden layer and output layer. A sketch map of a FNN is exhibited in Fig. 1.

A three-layer FNN with d input units, m hidden units and one output unit is mathematically represented as the following form

$$\begin{aligned} N({\mathbf{x}}):=\sum _{i=1}^mc_i\sigma \left( \sum _{j=1}^dw_{ij}x_j+\theta _{i}\right) , ~{\mathbf{x}}=(x_1,x_2,\ldots ,x_d)\in R^d,\quad d\ge 1, \end{aligned}$$

(1.1)

where ${\mathbf{w}}_i=(w_{i1},w_{i2},\ldots ,w_{id})^T\in R^d$ are connection weights of the unit i in the hidden layer with the input units, $c_i\in R$ are the connection strengths of unit i with the output unit, $\theta _i\in R$ are the thresholds and $\sigma $ is the activation function. The activation function be usually considered as sigmoid style, namely, it satisfies $ \lim _{x\rightarrow +\infty }\sigma (x)=1 $ and $ \lim _{x\rightarrow -\infty }\sigma (x)=0. $ Equation (1.1) can be further shown in vector pattern as

$$\begin{aligned} N({\mathbf{x}}):=\sum _{i=1}^mc_i\sigma ({\mathbf{w}}_i\cdot {\mathbf{x}}+\theta _i),\quad {\mathbf{x}}\in R^d,\quad d\ge 1. \end{aligned}$$

We will study the following set of functions

$$\begin{aligned} {{\mathbb {N}}}_{n+1}^d(\phi ):=\left\{ N({\mathbf{x}})=\sum _{j=0}^nc_j\phi ({\mathbf{w}}_j\cdot {\mathbf{x}}+b_j), {\mathbf{w}}_j\in R^d, c_j, b_j\in R\right\} , \end{aligned}$$

where ${\mathbf{w}}_j\cdot {\mathbf{x}}$ expresses the ordinary dot product of $R^d$ and $\phi $ is a function of R to itself. We call three-layer FNNs with one hidden layer to be the elements of ${{\mathbb {N}}}_{n+1}^d(\phi )$, and also call unit to each summand of $N({\mathbf{x}})$.

Function approximation by FNNs (1.1) has been extensively studied in the past years with a variety of important results involving density or complexity (see, e.g., [1,2,3,4,5,6,7,8,9,10,11,12,13,14]). The density problem is to determine the requirements under which any function can be approximated by a three-layer FNN with arbitrary precision. The complexity problem is to ascertain the relationship between the smoothness of the approximated function and the lost necessary to attain an approximation with a desired accuracy, which is nearly equivalent to the problem of the metric of approximation [10]. In essence, the problem of density is qualitative research, while the complexity is a quantitative study. Up to now, all kinds of density and complexity outcomes on approximation of the functions by the FNNs (1.1) in the set ${{\mathbb {N}}}_{n+1}^d(\phi )$ were given by using different approaches for more or less general situations ( for instance [10, 23,24,25,26,27,28,29,30] and references therein). However, in previous papers [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, 30,31,32,33,34,35,36,37,38,39,40,41], the weights and thresholds in FNNs vary such that the results are very difficult to be applied in reality.

Let $S=\{{\mathbf{x}}_0, {\mathbf{x}}_1, \ldots , {\mathbf{x}}_n\}\subset R^d$ be a set of mutually different vectors, $\{y_i, i=0,1,\ldots ,n\}$ be a set of real numbers and $ ({\mathbf{x}}_0,y_0), ({\mathbf{x}}_1,y_1),\ldots , ({\mathbf{x}}_n,y_n), $ be a group of ordered pairs. We know that the FNNs (1) $N: R^d\rightarrow R$ is an interpolation of these ordered pairs if $N({\mathbf{x}}_i)=y_i, i=0, 1, \ldots , n$.

As we know, three-layer FNNs with at most $n+1$ summands (components of ${{\mathbb {N}}}_{n+1,\varphi }^d$) can learn $n+1$ different samples $({\mathbf{x}}_i, y_i)$ with zero error (exact approximation), and the weights ${\mathbf{w}}_j$ and thresholds $b_j$ can be selected “almost” arbitrarily. Two main types of proofs of this conclusion have been provided. One is analysis mode, which can be founded in [3, 4, 16, 17, 30, 31]. Another is algebraic form, which has constructive features as given in [4, 6, 18,19,20,21, 31,32,33, 42]. Other direct methods of finding one weight are more difficult and burdensome [4, 18]. From the process of the proof in these references, we can see that it is basically invalid, since almost all the algebraic methods and other direct approaches need to solve a $(n+1)\times (n+1)$ matrix or its inverse matrix. Particularly, when the number of units is large.

2 Description of problems

The problems considered in this paper are as follows: In previous studies ([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, 30,31,32,33,34,35,36,37,38,39,40,41] about the density or complexity of approximation, interpolation), the weights and thresholds in FNNs vary and so the theoretical results are very difficult to be applied in approximate calculation and other aspects. In order to make them easy in application, Ismailov [43] studied function approximation by FNNs with weights varying on a finite set of directions. Nageswara [44] considered learning a function f by using feed-forward sigmoid networks with a single hidden layer and bounded weights. For any continuous function on a compact subset of R, Chui and Li [11] established a density result by FNNs with a sigmoidal function having integer weights and thresholds. Ito [12] proved that the FNNs with sigmoidal functions having unit weights can approximate any continuous function to arbitrary precision on a compact subset of R.

What if the weights and thresholds in FNNs are fixed? Are these kinds of FNNs possible to approximate arbitrary continuous functions in this case? For all we know, this question was first solved by Hahm and Hong [45]. They showed that a FNNs with a sigmoidal activation function and fixed weights can approximate any function to arbitrary precision in $C_0$ on R. Unfortunately, the findings mentioned above almost are qualitative in feature [45]. Actually, from the application point of view, however, the quantitative study of FNNs approximation is more useful.

The so-called quantitative research is the upper and lower bounds estimations of the neural network approximation ability. (If the upper and lower bounds estimations have the same order, then we call the order of the bounds as the essential order of approximation [46]). In the past ten years, Xu has led the team to create a precedent in the field of this research, and has made a series of important theoretical results, which lays some key foundations for further research on the complexity of the neural network approximation [46,47,48,49,50,51,52,53]. Certainly, these important results are more inclined to theoretical research and are not easy to practice in reality, because the inner universal functions are highly non-smooth or incomputable [47, 52], or infinite differentiability [46, 47, 51, 53], or the number of hidden neurons is exceedingly large [46, 50, 52], or the weights and thresholds in FNNs are variable [51,52,53].

Based on the above systematical analysis, the following problems arise naturally:

Problem 2.1

Can we fix the weights and thresholds in FNNs and provide the quantitative study of their approximation ability to make them easy in theory or application?

Problem 2.2

How the approximation capability of a FNN with the fixed weights and thresholds is related to the topology of the network? Loosely speaking, how many hidden units are required in order for this network to reach a predetermined approximation precision?

Problem 2.3

Is there a way to get the weights and the thresholds of an exact FNN approximation without training? In other words, is there an effective method to find the fixed weights and thresholds in FNN to satisfy the interpolation conditions?

The purpose of this paper is to solve all problems mentioned above by constructing three types of FNNs with optimized activation functions and fixed weights and thresholds and establishing the quantitative approximation theorems. In the following, optimized activation functions and three types of FNNs are defined and then, in Sect. 4, some approximation and interpolation results are obtained. In Sect. 5, applying the theoretical results obtained in this paper, we demonstrate some numerical approximation and interpolation results that show good agreement with theoretical results. Finally, we summarize the paper and foresee problems for the further study.

3 Optimized activation function and constructed FNNs

In this section, we wish to study the activation function usually used in the literature. In fact, a neuron cannot stay excited or inhibited indefinitely. Therefore, in this work, we appoint there exist excitement and inhibition. Based on this hypothesis, we define triangular and trapezoidal units. By using these neurons, we can provide many activation functions, which indicate naturally why the neural network has the ability of uniform approximation. In what follows, we appoint that C[a, b] is the set of all continuous functions $f: [a,b]\rightarrow R$ defined on the bounded interval [a, b].

Let $\sigma : R\rightarrow [0,c]$ be the ramp function defined by

$$\begin{aligned} \sigma (x):= \left\{ \begin{array}{ll} 0, &{}\quad x\le -\,\mu _0,\\ c, &{}\quad x\ge \mu _0,\\ \frac{x+\mu _0}{2\mu _0}c, &{}\quad -\,\mu _0<x<\mu _0, \end{array} \right. c\in R^+,\quad 0<\mu _0\le \frac{1}{2}. \end{aligned}$$

(3.1)

Figure 2 exhibits the ramp function that defined by Eq. (3.1).

Remark 3.1

The ramp transfer function defined above is an example of sigmoidal activation function when $c=1$. If $c=1$, and $\mu _0=1/2$, it is the same as in [22].

We are now trying to construct a new function $\varphi _1$ by using the ramp functions. We define

$$\begin{aligned} \varphi _1(x) &:= \sigma (x+\mu _0)-\sigma (x-\mu _0)\nonumber \\&= {} \left\{ \begin{array}{ll} 0, &{}\quad |x|\ge 2\mu _0,\\ \left( 1-\frac{1}{2\mu _0}|x|\right) c, &{}\quad |x|<2\mu _0, \end{array} \right. c\in R^+,\quad 0<\mu _0\le \frac{1}{2}. \end{aligned}$$

(3.2)

Thus, Eq. (3.2) illustrates the triangle function (see Fig. 3).

Figures 2 and 3 exhibit the activation functions with unbounded excitement or unbounded inhibition. In fact, from a biological point of view, excitement and inhibition should not happen abruptly. There should have balance (buffer) zones for both excitement and inhibition. For that reason, we can define a more reasonable and nonnegative activation function as follows:

$$\begin{aligned} \varphi _2(x)&:= \sigma (x+2\mu _0)-\sigma (x-2\mu _0)\nonumber \\&= \left\{ \begin{array}{ll} 0, & \quad |x|\ge 2\mu _0,\\ \left( 1-\frac{1}{\mu _0}|x|\right) c, & \quad \mu _0<|x|<2\mu _0,\\ c, &{}\quad |x|\le \mu _0, \end{array} \right. c\in R^+,\quad 0<\mu _0\le \frac{1}{2}, \end{aligned}$$

(3.3)

which optimizes the ramp activation function and the triangle activation function. We now see that Eq. (3.3) shows a trapezoidal activation function, which is illustrated in Fig. 4.

Remark 3.2

The ramp activation functions, triangle activation functions and trapezoidal activation functions are all piecewise linear activation functions.

The triangle functions $\varphi _1(x)$ and the trapezoidal functions $\varphi _2(x)$ have the following helpful properties:

$\hbox {P}_1:$ :: Both $\varphi _1(x)$ and $\varphi _2(x)$ are even functions;
$\hbox {P}_2:$ :: Both $\varphi _1(x)$ and $\varphi _2(x)$ are non-decreasing for $x<0$ and non-increasing for $x>0$;
$\hbox {P}_3:$ :: ${\mathrm{Supp}}(\varphi _j)\subseteq [-\,c,c],~j=1, 2$.

We now consider the uniform space nodes $x_k=a+kh, k=0,1,\ldots ,n$, on the interval [a, d], where $h=\frac{d-a}{n}=\frac{2(b-a)}{n}, b=\frac{d+a}{n}$.

Now, we are able to construct three types of FNNs based on piecewise line activation functions $\varphi _1(x)$ and $\varphi _2(x)$ above defined.

Definition 3.1

If $n\in N^+$ and $f: [a,d]\rightarrow R$ is a bounded and measurable function, then we construct the FNNs with optimized piecewise line activation functions $\varphi _1(x)$ and $\varphi _2(x)$ as follows:

$$\begin{aligned} N_{n, j}(f, x):=\frac{\sum _{k=0}^nf(x_k)\varphi _j\left( \frac{n}{2(b-a)}x-\frac{n}{2(b-a)}x_k\right) }{\sum _{k=0}^n\varphi _j\left( \frac{n}{2(b-a)}x-\frac{n}{2(b-a)}x_k\right) },\quad x\in [a,d], \quad j=1, 2. \end{aligned}$$

(3.4)

Remark 3.3

Actually, in [22], such similar neural network, are called “interpolation neural network operators” and have been introduced and studied in case of $\varphi _1(x)$ only with parameter $c=1$, i.e., when $\sigma $ is a sigmoidal function, and with the parameter $\mu _0=1/2$, the step $h=\frac{b-a}{n}$.

Certainly, the FNNs can be established by using other types of sigmoidal functions as activation function. Denoted by

$$\begin{aligned} M_s(x):=\frac{1}{(s-1)!}\sum _{i=0}^s(-\,1)^i{s\atopwithdelims ()i}(\mu _0s+x-i)^{s-1}_{+},\quad x\in R, \end{aligned}$$

the well-known B-splines of order $s\in N^+$ [54]. Here and hereafter, the function $(x)_+:=\max \{x, 0\}$ represents the positive past of x. The functions $M_s$ have compact support with ${\mathrm{supp}}(M_s)\subseteq [-\,s/2, s/2]$ for arbitrary $s\in N^+$. Only if $\mu _0=1/2$, the $M_s(x)$ are the well-known central B-splines. We recall the definition of other kinds of sigmoidal functions $\sigma _{M_s}(x)$, firstly introduced in [34].

$$\begin{aligned} \sigma _{M_s}(x):=\int _{-\infty }^{x}M_s(t){\mathrm{d}}t,\quad x\in R. \end{aligned}$$

We can easily find that $\sigma _{M_1}(x)$ accords exactly with the piecewise linear function $\sigma (x)$. Now, we can define the following nonnegative activation functions:

$$\begin{aligned} \varphi _s(x):= \sigma _{M_s}(x+\mu _0)-\sigma _{M_s}(x-\mu _0),\quad x\in R, \end{aligned}$$

(3.5)

for any $s\in N^+$. Similarly to the case of the piecewise linear functions $\varphi _1$ and $\varphi _2$, the function $\varphi _s(x)$ possesses the following properties:

$\hbox {P}_4$::: $\varphi _s(x)$ is an even function;
$\hbox {P}_5$::: $\varphi _s(x)$ is non-decreasing for $x<0$ and non-increasing for $x\ge 0$;
$\hbox {P}_6$::: ${\mathrm{supp}}(\varphi _s)\subseteq [-\,K_s,K_s]:=[-\,\mu _0(s+1),\mu _0(s+1)]$, and $\varphi _s\left( \frac{K_s}{2}\right) =\varphi _s\left( \frac{\mu _0(s+1)}{2}\right) >0$.

Definition 3.2

For any bounded and measurable function $f: [a,d]\rightarrow R$, the approximator: FNNs with activation function $\varphi _s$ are defined by

$$\begin{aligned} N_{n, s}(f, x):=\frac{\sum _{k=0}^nf(x_k)\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) }{\sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) },\quad x\in [a,d]. \end{aligned}$$

(3.6)

Remark 3.4

In fact, when $s=1$, the FNNs defined in (3.6) degenerate to those recalled in (3.4). Also the neural networks $N_{n,s}(f,x)$ have been originally defined and studied in [22] in case of $\mu _0=1/2$, and the step $h=\frac{b-a}{n}$.

4 Theoretical results

In the current note, the following quantitative approximation results for the family of FNNs $N_{n, j},~j=1, 2$ and $N_{n, s},~s\in N^+$ can be proved. For every $r\in N^{+}$, $\delta >0$, and any function $f\in C[a,d]$, we now recall the well-known definition of rth order modulus of smoothness, and the Lipschitzian function class of f as follows [55]:

$$\begin{aligned} \omega _r(f,\delta ):=\sup _{a\le x,x+rt\le b, |t|\le \delta }|\Delta _t^rf(x)|, \end{aligned}$$

where $\Delta _t^rf(x):=\Delta _t^1\Delta _t^{r-1}f(x)$, and $\Delta _t^1f(x):=f(x+t)-f(x)$. Note that, when $r=1$, we have $\omega _1(f,\delta )=\omega (f,\delta )$. Namely, the first order modulus of smoothness of f is the same as modulus of continuity of f.

$$\begin{aligned} \text {Lip}(f, \alpha )_2:=\left\{ f~|~\omega _2(f,\delta )\le M\delta ^{\alpha },~ \alpha \in (0,2],\; M\;{\mathrm{is}}\;{\mathrm{a}}\;{\mathrm{positive}}\;{\mathrm{constant}}\right\} . \end{aligned}$$

The rth order modulus of smoothness possesses the following helpful properties:

1.
$\omega _r(f,\delta )$ is a monotonically increasing continuous function about $\delta $, and $\omega _r(f,0)=0$ ;
2.
If $0\le s<r,$ then $\omega _r(f,\delta )\le 2^{r-s}\omega _s(f,\delta )$;
3.
If $0<\delta <\eta ,$ then $0<\omega _r(f,\eta )-\omega _r(f,\delta )\le 2^{r}r\omega (f,\eta -\delta )$, and $\eta ^{-r}\omega _r(f,\eta )\le 2^{r}\delta ^{-r}\omega _r(f,\eta -\delta )$;
4.
$\omega _r(f,p\delta )\le p^{r}\omega _r(f,\delta )$ for any $p\in N^{+}$;
5.
$\omega _r(f,q\delta )\le \omega _r(f,[q+1]\delta )\le (q+1)^{r}\omega _s(f,\delta )$ for arbitrary non-natural number $q>0$;
6.
If f has rth order continuous derivatives, then $\omega _r(f,\delta )\le \delta ^{r}||f^{(r)}||_{C}$, and $\omega _{r+s}(f,\delta )\le \delta ^{r}\omega _s(f^{(r)},\delta )$.

4.1 Upper bound of approximation

For any continuous functions defined on [a, d], the following upper bound estimations theorem about the quantitative research on the approximation of FNNs can be proved.

we now give the main results of this subsection.

Theorem 4.1

Let $f\in C[a,d]$ be fixed. Then

$$\begin{aligned}&||N_{n, 1}(f, x)-f(x)||_{\infty }\le 4\omega _2\left( f, \frac{b-a}{n}\right) ,\quad \forall ~ n\in N^+, \end{aligned}$$

(4.1)

$$\begin{aligned}&||N_{n, 2}(f, x)-f(x)||_{\infty }\le 4\omega _2\left( f, \frac{b-a}{n}\right) ,\quad \forall ~ n\in N^+, \end{aligned}$$

(4.2)

$$\begin{aligned}&||N_{n, s}(f, x)-f(x)||_{\infty }\le \frac{2}{\varphi _s(\frac{K_s}{2})}\omega _2\left( f, \frac{b-a}{n}\right) ,\quad \forall ~n, s\in N^+. \end{aligned}$$

(4.3)

Remark 4.1

The Theorem 4.1 is the positive theorem of approximation of the three kinds of FNNs. These approximation upper bounds are inspired to the results originally in [22], in case of $\varphi _1(x)$, with $c=1$, $\mu _0=1/2$, and the step $h=\frac{b-a}{n}$, and for $\varphi _2(x)$ with $\mu _0=1/2$, the step $h=\frac{b-a}{n}$. Moreover, Eqs. (4.1) and (4.3) obviously deepen the results proved in [22] ( For instance, if $g(x)=x^n, n\in N$ is a polynomial, then $\omega (g, t)=\bigcirc (t)$, while $\omega _2(g, t)=\bigcirc (t^2)$.), and Eq. (4.2) represents a completely new result.

Proof

In fact, we find that for each $x\in [a,d]$, by using P$_1$, and then, we obtain

$$\begin{aligned} \sum _{k=0}^n\varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) = \sum _{k=0}^n\varphi _j\left( \frac{n|x-x_k|}{2(b-a)}\right) \ge \varphi _j\left( \frac{n|x-x_i|}{2(b-a)}\right) ,\quad j=1, 2, \end{aligned}$$

(4.4)

where $i\in \{0,1,\ldots ,n\}$ satisfies $|x-x_i|\le \mu _0h$. Thus,

$$\begin{aligned} \frac{n|x-x_i|}{2(b-a)}\le \frac{n\mu _0h}{2(b-a)}=\mu _0, \end{aligned}$$

(4.5)

By Eqs. (4.4), (4.5) and P$_2$, we have

$$\begin{aligned} \sum _{k=0}^n\varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) \ge \varphi _j\left( \frac{n|x-x_i|}{2(b-a)}\right) \ge \varphi _j(\mu _0)=\left\{ \begin{array}{ll} \frac{1}{2}c, &{}\quad j=1,\\ c, &{}\quad j=2. \end{array} \right. \end{aligned}$$

(4.6)

In addition, for any bounded and measurable function $f: [a,d]\rightarrow R$, we obtain

$$\begin{aligned} |N_{n, j}(f, x)|\le ||f||_{\infty }\frac{\sum _{k=0}^n\varphi _j \left( \frac{n(x-x_k)}{2(b-a)}\right) }{\sum _{k=0}^n \varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) } =||f||_{\infty }<+\infty ,\quad j=1, 2, \end{aligned}$$

for each $x\in [a,d]$, where $||f||_{\infty }:= \sup _{x\in [a,d]}|f(x)|$.

For each $x\in [a,d]$ and by (4.6), we know that

$$\begin{aligned} \big |N_{n, 1}(f, x)-f(x)\big |= & {} \frac{\left| \sum _{k=0}^nf(x_k)\varphi _1 \left( \frac{n(x-x_k)}{2(b-a)}\right) -f(x)\sum _{k=0}^n\varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) \right| }{\sum _{k=0}^n\varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) }\\\le & {} \frac{2}{c}\left| \sum _{k=0}^nf(x_k)\varphi _1 \left( \frac{n(x-x_k)}{2(b-a)}\right) -f(x)\sum _{k=0}^n\varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) \right| \\\le & {} \frac{2}{c}\sum _{k=0}^n\left| f(x_k)-f(x)\right| \varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) , \end{aligned}$$

for every fixed $n\in N^+$. We now choose $i\in \{0,1,\ldots ,n-1\}$ such that $x_i\le x\le x_{i+1}$, then

$$\begin{aligned} \left| N_{n, 1}(f,x)-f(x)\right|\le & {} \frac{2}{c}\left[ \sum _{k=0,k\ne i,i+1}^n|f(x_k)-f(x)|\varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) \right. \\&\left. +\,|f(x_i)-f(x)|\varphi _1\left( \frac{n(x-x_i)}{2(b-a)}\right) \right. \\&\left. +\,|f(x_{i+1})-f(x)|\varphi _1\left( \frac{n(x-x_{i+1})}{2(b-a)}\right) \right] \\=: & {} \frac{2}{c}[I_1+I_2+I_3]. \end{aligned}$$

Obviously, for $k\ne i, i+1$, we obtain $\frac{n|x-x_k|}{2(b-a)}\ge \frac{nh}{2(b-a)}=1$, then by the properties P$_1$, P$_2$ and P$_3$, we have

$$\begin{aligned} 0\le \varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) =\varphi _1 \left( \frac{n|x-x_k|}{2(b-a)}\right) \le \varphi _1\left( \frac{nh}{2(b-a)}\right) =\varphi _1(1)=0, \end{aligned}$$

which implies, $I_1=0$. Since $|x_i-x|\le h $ and $|x_{i+1}-x|\le h$, we can find

$$\begin{aligned} |f(x_i)-f(x)| \le \omega _2\big (f,\frac{b-a}{n}\big ). \end{aligned}$$

Similarly,

$$\begin{aligned} |f(x_{i+1})-f(x)|\le \omega _2\left( f,\frac{b-a}{n}\right) . \end{aligned}$$

Finally, we obtain

$$\begin{aligned} I_1+I_2+I_3=I_2+I_3\le 2c\omega _2\left( f,\frac{b-a}{n}\right) . \end{aligned}$$

With this, the proof of (4.1) in Theorem 4.1 is completed. Applying the same method used in proof of (4.1) in Theorem 4.1, we can prove (4.2) in Theorem 4.1. We omit the details. Next, we will prove (4.3) in Theorem 4.1.

Employing the technique similar to that adopted in Eqs. (4.4)–(4.6), it is easily to prove that the above FNNs (3.6) are well defined for arbitrary $n\in N^+$ and

$$\begin{aligned} \sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \ge \varphi _s\left( \frac{K_s}{2}\right) >0. \end{aligned}$$

Now, for each $x\in [a, d]$, and $f\in C[a,d]$, there exists $i\in \{0, 1,2,\ldots ,n-1\}$ such that $x_i\le x\le x_{i+1}$, and then we have

$$\begin{aligned} \left| N_{n, s}(f, x)-f(x)\right|= & {} \frac{\left| \sum _{k=0}^nf(x_k)\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) -f(x)\sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \right| }{\sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) }\\\le & {} \frac{1}{\varphi _s(K_s/2)} \left| \sum _{k=0}^nf(x_k)\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \right. \\&\left. -\,f(x)\sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \right| \\\le & {} \frac{1}{\varphi _s(K_s/2)} \sum _{k=0}^n|f(x_k)-f(x)|\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \\= & {} \frac{1}{\varphi _s(K_s/2)}\left[ \sum _{k=0,k\ne i,i+1}^n|f(x_k)-f(x)|\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \right. \\&\left. +\,|f(x_i)-f(x)|\varphi _s\left( K_s\frac{n(x-x_i)}{2(b-a)}\right) \right. \\&\left. +\,|f(x_{i+1})-f(x)|\varphi _s\left( K_s\frac{n(x-x_{i+1})}{2(b-a)}\right) \right] \\=: & {} \frac{1}{\varphi _s(K_s/2)}[J_1+J_2+J_3]. \end{aligned}$$

The addends $J_1, J_2$ and $J_3$ can be handled as made in the proof of (4.1) in Theorem 4.1, and then, (4.3) in Theorem 1 follows immediately. This completes the proof of the Theorem 4.1.

4.2 Lower bound of approximation

Until now, many results about density and upper bound estimations on approximation of the functions by the FNNs (1.1) in the set ${{\mathbb {N}}}_{n+1}^d(\phi )$ were given by many researchers [2, 9, 23, 24, 26,27,28,29,30,31,32,33, 54,55,56,57,58,59,60,61,62,63,64,65]. In fact, because the established upper bound estimations can only control one side of the approximation error, the estimation results might be too loose to perfectly reflect the approximation capability of the FNNs. Naturally, in order to characterize the approximation ability of FNNs more precisely, besides upper bound estimation, a lower bound estimation that reflects the worst approximation precision of the network is still a question that is worth studying. Therefore, we emphasize that the results of this subsection are completely new.

We now give the main results of this subsection.

Theorem 4.2

Let$f\in C[a,d]$be fixed. Then

$$\begin{aligned}&\omega _2\left( f,\frac{b-a}{n}\right) \le \frac{C}{n}\sum _{i=1}^n||N_{i, j}(f, x)-f(x)||_{\infty },\quad j=1,2,~n\in N^+, \end{aligned}$$

(4.7)

$$\begin{aligned}&\omega _2\left( f,\frac{b-a}{n}\right) \le \frac{C}{n}\sum _{i=1}^n||N_{i, s}(f, x)-f(x)||_{\infty },\quad \forall ~ s\in N^+. \end{aligned}$$

(4.8)

Here and hereafter, C that appears in different situations may be different but all are positive constants independent of n and f.

Remark 4.2

We emphasize that the results of this subsection are completely new. The Theorem 4.2 is the converse theorem of approximation of the three kinds of FNNs. These conclusions reveal three lower bound estimations on approximation precision of these FNNs, which means that the average of these FNNs over the number of hidden neurons is lower controlled by the second order modulus of smoothness of function f.

In order to prove Theorem 4.2, we will use the famous Bernstein polynomial and the extended Bernstein polynomial as two basic tools. Let $f\in C[0,1]$, the sequence of Bernstein polynomials for f(x) is defined by [66]

$$\begin{aligned} B_n(f,x):=\sum _{k=0}^nf\left( \frac{k}{n}\right) {n\atopwithdelims ()k}x^k(1-x)^{n-k},\quad x\in [0,1]. \end{aligned}$$

Similarly, if $f\in C[a,d]$, then we define the extended Bernstein polynomials for f(x) as follows

$$\begin{aligned} {\mathrm{EB}}_n(f,x):=\sum _{k=0}^nf\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}\left( \frac{x-a}{d-a}\right) ^k\left( 1-\frac{x-a}{d-a}\right) ^{n-k},\quad x\in [a,d]. \end{aligned}$$

For nearly half a century, the Bernstein polynomial and its improvement have attracted much interest, and a great number of interesting results to the classical Bernstein polynomial have been obtained [66,67,68,69].

The following fundamental result on classical Bernstein polynomial is well known [55]. In order to facilitate readers, we give the lemmas as follows:

Lemma 4.1

([55]) Let $f\in C[0,1]$. Then there is positive constantCsuch that

$$\begin{aligned} w_2\left( f;\frac{1}{n}\right) \le \frac{C}{n}\sum _{k=1}^n||B_k(f,x)-f(x)||_{\infty }. \end{aligned}$$

According to the Lemma 4.1, we can easily get the following Lemma 4.2.

Lemma 4.2

If$f\in C[a,d]$. Then there is positive constantCsuch that

$$\begin{aligned} w_2\left( f;\frac{b-a}{n}\right) \le w_2\left( f;\frac{d-a}{n}\right) \le \frac{C}{n}\sum _{k=1}^n||{\mathrm{EB}}_k(f,x)-f(x)||_{\infty }. \end{aligned}$$

Lemma 4.3

([45]) Let$f\in C[a, d]$. If$\sigma $is a bounded measurable sigmoidal function onR. Then, for any$\epsilon >0$, there is a neural network$N_n(x)$of the form (1.1), such that

$$\begin{aligned} |N_n(x)-f(x)|<\epsilon , \end{aligned}$$

where $N(x)=\sum _{i=1}^nc_i\sigma (w_{i}x+\theta ), ~c_i, w_i, \theta \in R$.

We now are to prove Theorem 4.2. First, we demand an equivalent description of the extended Bernstein operator.

$$\begin{aligned} {\mathrm{EB}}_n(f,x)= & {} \sum _{k=0}^nf\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}\left( \frac{x-a}{d-a}\right) ^k\left( 1-\frac{x-a}{d-a}\right) ^{n-k}\nonumber \\= & {} \sum _{k=0}^nf\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}\left( \frac{x-a}{d-a}\right) ^k\nonumber \\&\quad \times&\left( 1+C_{n-k}^1\left( -\frac{x-a}{d-a}\right) +\cdots +C_{n-k}^i \left( -\frac{x-a}{d-a}\right) ^i+\cdots +\left( -\frac{x-a}{d-a}\right) ^{n-k}\right) \nonumber \\= & {} \sum _{k=0}^nf\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}\sum _{i=0}^{n-k}(-\,1)^i\frac{(n-k)!}{i!(n-k-i)!}\left( \frac{x-a}{d-a}\right) ^{i+k}\nonumber \\= & {} \sum _{k=0}^n\sum _{i=0}^{n-k}f\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}(-\,1)^i\frac{(n-k)!}{i!(n-k-i)!}\left( \frac{x-a}{d-a}\right) ^{i+k}\nonumber \\= & {} \sum _{k=0}^n\sum _{i=0}^{n-k}d_{i,k}\left( \frac{x-a}{d-a}\right) ^{i+k},\quad x\in [a,d], \end{aligned}$$

(4.9)

where $d_{i,k}=(-1)^if (a+\frac{k}{n}(d-a)){n\atopwithdelims ()k}\frac{(n-k)!}{i!(n-k-i)!}$.

Second, let r be a fixed integer and $P_r(x)=a_rx^r, x\in [a, d]$ be a univariate polynomial of degree r. By Lemma 4.3, we know that there is a FNN of the form (1.1) the number of whose hidden units is not less than $(r+1)$ such that

$$\begin{aligned} |N_n(x)-P_r(x)|<\epsilon . \end{aligned}$$

We find that in (4.9) each term is a univariate polynomial of x with order $i+k~ (i+k\le n)$; therefore, it can be approximated arbitrarily well by a FNN of the form

$$\begin{aligned} N_{i+k+1}(x)=\sum _{l=1}^{K_{i+k}}c_{l,i+k}\sigma (w_{l,i+k}x+\theta ),\quad K_{i+k+1}\ge i+k+1. \end{aligned}$$

(4.10)

Because $B_n(f,x)$ and ${\mathrm{EB}}_n(f,x)$ can approximate f, the following FNNs

$$\begin{aligned} \sum _{k=0}^n\sum _{i=0}^{n-k}d_{i,k}\sum _{l=1}^{K_{i+k}} c_{l,i+k}\sigma (w_{l,i+k}x+\theta ),\quad c_{l,i+k},w_{l,i+k}\in R,\quad K_{i+k}\ge i+k+1 \end{aligned}$$

can approximate f to any accuracy. In consequence, the networks (4.10) will be the FNN models we propose to use in this subsection.

At the third stage, according to Lemma 4.3, the polynomial $x^{i+k} (i+k\le n)$ can be approximated by a network having the following form

$$\begin{aligned} N_{K_{i+k}}=\sum _{l=1}^{K_{i+k}}c_{l,i+k} \sigma (w_{l,i+k}x+\theta ),\quad c_{l,i+k},w_{l,i+k}\in R,\quad K_{i+k}\ge i+k+1, \end{aligned}$$

(4.11)

with approximation error

$$\begin{aligned} |N_{K_{i+k}}-x^{i+k}|<\epsilon . \end{aligned}$$

(4.12)

Equations (4.11) and (4.12) imply

$$\begin{aligned} ||N_m(f,x)-{\mathrm{EB}}_m(f,x)||_{\infty }= & {} \left| \left| \sum _{k=0}^m\sum _{i=0}^{m-k}d_{i,k}\left\{ x^{i+k}-\sum _{l=1}^{K_{i+k}} c_{l,i+k}\sigma (\omega _{l,i+k}x+\theta )\right\} \right| \right| _{\infty }\\= & {} \sum _{k=0}^m\sum _{i=0}^{m-k}|d_{i,k}|\max _{x\in [a,b]} \left| x^{i+k}-N_{K_{i+k}}\right| \\\le & {} \epsilon \sum _{k=0}^m\sum _{i=0}^{m-k}|d_{i,k}|. \end{aligned}$$

Next, taking $f(x_k)=\sum _{i=0}^{m-k}\sum _{l=1}^{K_{i+k}}d_{i,k}c_{l,i+k}$, $\varphi _j=\sigma $, $\frac{n}{2(b-a)}=\omega _{l,i+k}$, and $\theta =\frac{n}{2(b-a)}x_k$ in (3.4), and by (4.6), we have

$$\begin{aligned} \left| \left| N_{m, j}(f, x)-{\mathrm{EB}}_m(f,x)\right| \right| _{\infty }= & {} \left| \left| \frac{\sum _{k=0}^mf(x_k) \varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) }{\sum _{k=0}^m\varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) }-{\mathrm{EB}}_mf(x) \right| \right| _{\infty }\\\le & {} \frac{2}{c}\left| \left| \sum _{k=0}^mf(x_k)\varphi _j \left( \frac{n(x-x_k)}{2(b-a)}\right) -{\mathrm{EB}}_m(f,x)\right| \right| _{\infty }\\\le & {} C\left| \left| \sum _{k=0}^mf(x_k)\varphi _j \left( \frac{n(x-x_k)}{2(b-a)}\right) -{\mathrm{EB}}_m(f,x)\right| \right| _{\infty }\\= & {} C\left| \left| \sum _{k=0}^m\sum _{i=0}^{m-k}d_{i,k}\sum _{l=1}^{K_{i+k}} c_{l,i+k}\sigma (w_{l,i+k}x+\theta )-{\mathrm{EB}}_m(f,x)\right| \right| _{\infty }\\= & {} C||N_m(f, x)-{\mathrm{EB}}_m(f, x)||_{\infty }\\\le & {} C\epsilon \sum _{k=0}^m\sum _{i=0}^{m-k}|d_{i,k}|. \end{aligned}$$

Finally, for the constructed FNN

$$\begin{aligned} \sum _{k=0}^n\sum _{i=0}^{n-k}d_{i,k}\sum _{l=1}^{K_{i+k}} c_{l,i+k}\sigma (w_{l,i+k}x+\theta )~c_{l,i+k},w_{l,i+k}\in R,\quad K_{i+k}\ge i+k+1, \end{aligned}$$

we obtain the lower bound estimation of $||N_{n,j}-f||_{\infty }, ~j=1,2$ as follows:

$$\begin{aligned} \omega _2\left( f,\frac{b-a}{n}\right)\le & {} \frac{C}{n}\sum _{k=1}^n||{\mathrm{EB}}_k(f, x)-f(x)||_{\infty }\\\le & {} \frac{C}{n}\sum _{k=1}^n\left\{ ||{\mathrm{EB}}_k(f, x)-N_{k,j}(f, x)||_{\infty }+||N_{k,j}(f, x)-f(x)||_{\infty }\right\} \\\le & {} \frac{C}{n}\sum _{k=1}^n||N_{k,j}(f, x)-f(x)||_{\infty }+\frac{C\epsilon }{n}\sum _{k=0}^n \sum _{i=0}^{n-k}\sum _{l=1}^{K_{i+k}}|d_{i,k}|. \end{aligned}$$

Letting $\epsilon $ tend to zero, it then follows that

$$\begin{aligned} \omega _2\left( f,\frac{b-a}{n}\right) \le \frac{C}{n}\sum _{k=1}^n||N_{k,j}(f, x)-f(x)||_{\infty }, j=1,2. \end{aligned}$$

Thus the first part of the Theorem 4.2 is proved. The second part of the Theorem 4.2 follows immediately by the same arguments used in the proof process of the first part of the Theorem 4.2. In order not to repeat, we omit the details.

4.3 Essential order of approximation

If the arithmetic means on the right sides of in Eqs. (4.1), (4.2), (4.7) and (4.3), (4.8) can be substituted by $||N_{n,j}(f, \cdot )-f(\cdot )||_{\infty }, j=1,2$ and $||N_{n,s}(f, \cdot )-f(\cdot )||_{\infty },$ respectively, then by Theorems 4.1 and 4.2, we obtain that the upper and lower bound estimations of approximation by the FNNs, $N_{n,j}(f), j=1,2$ and $F_{n,s}(f)$, become identical as the second order modulus of smoothness of approximated function f. Namely,

$$\begin{aligned} \omega _2\left( f,\frac{b-a}{n}\right) \sim ||N_{n,j}(f, \cdot )-f(\cdot )||_{\infty },\quad j=1,2, \end{aligned}$$

and

$$\begin{aligned} \omega _2\left( f,\frac{b-a}{n}\right) \sim ||N_{n,s}(f, \cdot )-f(\cdot )||_{\infty },\quad s\in N^{+}. \end{aligned}$$

Unfortunately, up to now, we cannot answer these problems for arbitrary function classes. Happily, we can solve them when the approximated function f belonging to the class of second order Lipschitz $\alpha (0<\alpha \le 2)$. From this point of view, the following Theorem 4.3 can be drawn directly by combining the Theorem 4.1 with the Theorem 4.2.

Theorem 4.3

Let $f\in C[a,d]$ be fixed. Then

$$\begin{aligned}&||N_{n, j}(f, x)-f(x)||_{\infty }=\bigcirc (n^{-\alpha })\Leftrightarrow f\in {\mathrm{Lip}}(\alpha )_2,\quad {\mathrm{for}}\;\; j=1, 2.\\&||N_{n, s}(f, x)-f(x)||_{\infty }=\bigcirc (n^{-\alpha })\Leftrightarrow f\in {\mathrm{Lip}}(\alpha )_2,\quad {\mathrm{for}}\;\; s\in N^+. \end{aligned}$$

Remark 4.3

We emphasize that the results of this subsection are completely new. The Theorem 4.3 points out that the inherent approximation order of these three kinds of FNNs is $O(n^{-\alpha })$. Thus, the approximation capability of the three kinds of FNNs is thoroughly decided by the smoothness of approximated functions. That is to say, the better the properties of the approximated function is, the higher the precision of approximation is. But the maximal precision of approximation cannot outperform $O(n^{-\alpha })$.

Remark 4.4

Theorems 4.1, 4.2 and 4.3 are three affirmative answers to the Problems 2.1 and 2.2. The connection weights and the thresholds are, respectively, equal to $\frac{n}{2(b-a)}$ and $\frac{-nx_k}{2(b-a)}$ in the FNNs $N_{n, 1}$ and $N_{n, 2}$. Moreover, the connection weights and the thresholds in the FNNs $N_{n, s}$ are equal to $K_s\frac{n}{2(b-a)}=\mu _0(s+1)\frac{n}{2(b-a)}$ and $-K_s\frac{nx_k}{2(b-a)}=-\mu _0(s+1)\frac{nx_k}{2(b-a)}$. Because $a, b\in R, ~\mu _0\in R^+$ are all constants, $s\in N^+$ is the positive integer, and $n\in N^+$ is the number of hidden neurons, we can find the connection weights and the thresholds in the FNNs to satisfy the approximation conditions and do not need to train. It gives the quantitative researches on approximation precision of these FNNs and characterizes the implicit relationship among the precision of approximation, the number of hidden neurons and the smoothness of the approximated function.

4.4 Interpolation

In the current note, the following interpolation (exact approximation) results for the family of FNNs $N_{n, j},~j=1, 2$ and $N_{n, s},~s\in N^+$ can be proved.

Theorem 4.4

If$f: [a,d]\rightarrow R$be a bounded and measurable function and$n\in N^+$, then

$$\begin{aligned} N_{n, 1}(f, x_i)& = {} f(x_i),\quad {\mathrm{for}}\;{\mathrm{every}}\; i=0,1,\ldots ,n, \end{aligned}$$

(4.13)

$$\begin{aligned} N_{n, 2}(f, x_i)& = {} f(x_i),\quad {\mathrm{for}}\;{\mathrm{every}}\; i=0,1,\ldots ,n, \end{aligned}$$

(4.14)

$$\begin{aligned} N_{n, s}(f, x_i)& = {} f(x_i),\quad {\mathrm{for}}\;{\mathrm{every}}\; i=0,1,\ldots , n,\,and\,s\in N^+. \end{aligned}$$

(4.15)

Remark 4.5

The results in Theorem 4.4 are inspired to the results originally in [22], in case of $\varphi _1(x)$, with $c=1$, $\mu _0=1/2$, and the step $h=\frac{b-a}{n}$, and for $\varphi _2(x)$ with $\mu _0=1/2$, and the step $h=\frac{b-a}{n}$. Moreover, Eqs. (4.13) and (4.15) represent a slight extension of the results proved in [22], and Eq. (4.14) represent a completely new result.

Proof

Let $i\in \{0,1,\ldots ,n\}$ be fixed. If $k=i$, then we obtain

$$\begin{aligned} \varphi _j\left( \frac{n(x_i-x_k)}{2(b-a)}\right) =\varphi _j(0)=c,\quad j=1,2. \end{aligned}$$

While, if $k\ne i$, then we have

$$\begin{aligned} \frac{n|x_i-x_k|}{2(b-a)}\ge \frac{nh}{2(b-a)}=1. \end{aligned}$$

Thus, by using $0<\mu _0\le \frac{1}{2}$ and the properties P$_1$ and P$_2$, we obtain that

$$\begin{aligned} 0=\varphi _j(2\mu _0)=\varphi _j(1)\ge \varphi _j \left( \frac{n|x_i-x_k|}{2(b-a)}\right) =\varphi _j \left( \frac{n(x_i-x_k)}{2(b-a)}\right) \ge 0,\quad j=1, 2. \end{aligned}$$

Therefore, we get

$$\begin{aligned} \varphi _j\left( \frac{n(x_i-x_k)}{2(b-a)}\right) = \left\{ \begin{array}{ll} c, &{}\quad i=k,\\ 0, &{}\quad i\ne k,\\ \end{array} \right. \quad j=1, 2 \end{aligned}$$

for each $i, k=0,1,\ldots ,n$, and this implies that

$$\begin{aligned} N_{n, j}(f, x_i)=\frac{f(x_i)\varphi _j\left( \frac{n(x_i-x_k)}{2(b-a)}\right) }{\varphi _j\left( \frac{n(x_i-x_k)}{2(b-a)}\right) }=f(x_i),\quad j=1, 2, \end{aligned}$$

for any $i\in \{0,1,\ldots ,\}$. Thus the first part of the Theorem 4.4 is proved.

Next, we will prove the second part of the Theorem 4.4. If $i\ne k$, $i,k=0,1,\ldots ,n$, then we have $|x_i-x_k|\ge h$, and $K_s\frac{n|x-x_k|}{2(b-a)}\ge K_s$. By the properties of $\varphi _s$ it turns out that

$$\begin{aligned} \varphi _s\left( K_s\frac{n(x_i-x_k)}{2(b-a)}\right) = \varphi _s\left( K_s\frac{n|x_i-x_k)|}{2(b-a)}\right) =0. \end{aligned}$$

Thus the second part of the Theorem 4.4 follows immediately by the same arguments used in the first part of the Theorem 4.4. For the sake of brevity, we omit the details.

Remark 4.6

The Theorem 4.4 shows the interpolation results of these FNNs with the fixed weights and the thresholds, which is the sublimation of the Theorems 4.1 and 4.2 and answers the Problem 2.3 successfully.

5 Numerical results

Because continuous functions on bounded interval are normally considered as target functions in engineering and other applications [1, 4,5,6, 14, 32], we now only focus on a numerical approximation to a continuous function on bounded interval. In Theorems 4.1, 4.2, 4.3 and 4.4, we show that any continuous function on bounded interval [a, b] can be approximated an FNNs with an optimized piecewise line activation function and fixed weights. We show our theoretical results using different activation functions and illustrate the error bound of FNNs approximation. All computations are done in Matalab 7.0.

Example 5.1

We choose a continuous function $f(x)=x^3$ as the target function and research the FNNs with optimized piecewise linear functions and fixed weights (defined by (3.4)) approximation to f(x) on the bounded interval [a, d].

By Eqs. (3.2) and (3.3), and letting $c=1$, we have

$$\begin{aligned} \varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) =\left\{ \begin{array}{ll} 0, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ 1-\frac{1}{2\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| , &{}\quad |x-x_k|<\frac{4(b-a)}{n}\mu _0, \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} \varphi _2\left( \frac{n(x-x_k)}{2(b-a)}\right) =\left\{ \begin{array}{ll} 0, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ 1-\frac{1}{\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| , &{}\quad \frac{2(b-a)}{n}\mu _0<|x-x_k|<\frac{4(b-a)}{n}\mu _0,\\ 1, &{}\quad |x-x_k|\le \frac{2(b-a)}{n}\mu _0. \end{array} \right. \end{aligned}$$

Then, the FNNs $N_{n,1}$ and $N_{n,2}$ are simply reduced to

$$\begin{aligned} N_{n,1}(f)=\left\{ \begin{array}{ll} 0, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ \frac{\sum _{k=0}^nf(x_k)\left( 1-\frac{1}{2\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| \right) }{\sum _{k=0}^n\left( 1-\frac{1}{2\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| \right) }, &{}\quad |x-x_k|<\frac{4(b-a)}{n}\mu _0, \end{array} \right. \end{aligned}$$

(5.1)

and

$$\begin{aligned} N_{n,2}(f)=\left\{ \begin{array}{ll} 0, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ \frac{\sum _{k=0}^nf(x_k)\left( 1-\frac{1}{\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| \right) }{\sum _{k=0}^n\left( 1-\frac{1}{\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| \right) }, &{}\quad \frac{2(b-a)}{n}\mu _0<|x-x_k|<\frac{4(b-a)}{n}\mu _0,\\ \left| \frac{\sum _{k=0}^nf(x_k)}{n+1}-f(x)\right| , &{}\quad |x-x_k|\le \frac{2(b-a)}{n}\mu _0. \end{array} \right. \end{aligned}$$

(5.2)

where $x_k$, a, b, and $\mu _0$ are defined exactly the same as before.

From Eqs. (5.1) and (5.2), we know that the connection weights from the input layer to the hidden layer and from the hidden layer to the output layer, and the thresholds are, respectively, equal to $\frac{n}{2(b-a)}$, $f(x_k)$ and $\frac{-\,nx_k}{2(b-a)}$ in the FNNs $N_{n, 1}$ and $N_{n, 2}$. Because $a, b\in R, ~f(x_k), \mu _0\in R^+$ are all constants, $s\in N^+$ is the positive integer, and $n\in N^+$ is the number of hidden neurons, so we can find that the weights and the thresholds in the FNNs $N_{n, 1}$ and $N_{n, 2}$ are all fixed constants. Therefore, the error of approximation is

$$\begin{aligned} \big |N_{n, 1}(f, x)-f(x)\big |=\left\{ \begin{array}{ll} |f(x)|, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ \bigg |\frac{\sum _{k=0}^nf(x_k)\big (1-\frac{1}{2\mu _0}\big |\frac{n}{2(b-a)}(x-a)-k\big |\big )}{\sum _{k=0}^n\big (1-\frac{1}{2\mu _0}\big |\frac{n}{2(b-a)}(x-a)-k\big |\big )}-f(x)\bigg |, &{}\quad |x-x_k|<\frac{4(b-a)}{n}\mu _0, \end{array} \right. \end{aligned}$$

(5.3)

and

$$\begin{aligned}\big |N_{n, 2}(f, x)-f(x)\big | =\left\{ \begin{array}{ll} |f(x)|, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ \bigg |\frac{\sum _{k=0}^nf(x_k)\big (1-\frac{1}{\mu _0}\big |\frac{n}{2(b-a)}(x-a)-k\big |\big )}{\sum _{k=0}^n\big (1-\frac{1}{\mu _0}\big |\frac{n}{2(b-a)}(x-a)-k\big |\big )}-f(x)\bigg |, &{}\quad \frac{2(b-a)}{n}\mu _0<|x-x_k|<\frac{4(b-a)}{n}\mu _0,\\ |1-f(x)|, &{}\quad |x-x_k|\le \frac{2(b-a)}{n}\mu _0. \end{array} \right. \end{aligned}$$

(5.4)

We let $a=-1, d=1,$ and $\mu _0=\frac{1}{4}$ in Eqs. (5.3) and (5.4). The following Fig. 5 shows a numerical result on $[-\,1,1]$ for $n=5, 10, 20, 40, 80, 160$.

Example 5.2

We also choose a continuous function $f(x)=x^3+x^2-5x+3$ as the target function and research the FNNs with optimized piecewise linear functions and fixed weights (defined by (3.4)) approximation to f(x) on the bounded interval [a, d].

We also let $a=-1, d=1,$ and $\mu _0=\frac{1}{4}$ in Eqs. (5.3) and (5.4). The following Figs. 6 and 7 provides some numerical results on the neighborhoods of the interpolation points for $n=5, 10$. (Note: Figure 7 is obtained from Fig. 6, which has been magnified to some extent. The interpolation points of $N_{5,1}(f)$ and $N_{10,1}(f)$ are respectively as − 1.0, − 0.6, − 0.2, 0.2, 0.6, 1.0 and − 1.0, − 0.8, − 0.6, − 0.4, − 0.2, 0, 0.2, 0.4, 0.6, 0.8, 1.0).

Example 5.3

We choose nonnegative density functions $\varphi _s$ (defined by Eq. (3.5)) as an activation function. We can compute the theoretical error bound like the Example 5.1. We omit the graphs of the nonnegative density functions neural network $N_{n, s}(f, x)$ since the corresponding graphs are almost the same as Figs. 6 and 7.

6 Conclusions and prospects

We have discussed the approximation of FNNs from the mathematical view in this paper. First, the optimized piecewise linear activation functions representations and structures of the three types of FNNs with fixed weights are constructed and discussed completely. Second, the ideal upper bound, lower bound and essential order of approximation precision of these FNNs for continuous function defined on bounded intervals are provided. Third, the interpolation results of these FNNs proved in this paper show that the representation errors made by these FNNs on the elements belonging to the training set are null. In other words, this implies that we can obtain the weights and the thresholds of an exact neural approximation without train. Finally, we also demonstrate some numerical results of examples that show the effectiveness of the method used in this paper. Our conclusions not only further characterize the intrinsic property of approximation of these FNNs, but also reveal the implicit relationship among the precision of approximation, the number of hidden units and the smoothness of the target function.

We wrap up this paper with the following prospects:

(a)
Although we give the essential approximation order of the constructed three kinds of FNNs, this is only for the Lipschitzian function class of approximated f. This means that the essential approximation order of the three FNNs for other function class of approximated f are worth to further study.
(b)
It is interesting and significant to extend the main theories in this paper to multivariate functions.

Clearly solving these two problems is not easy, but it is very important and valuable.

References

Cybenko G (1989) Approximation by superpositions of sigmoidal function. Math Control Signals Syst 2:303–314
Article MathSciNet Google Scholar
Chen TP (1994) Approximation problems in system identification with neural networks. Sci China Ser A 24:1–7
MathSciNet Google Scholar
Ito Y (1996) Nonlinearity creates linear independence. Adv Comput Math 5:189–203
Article MathSciNet Google Scholar
Barhen J, Cogswell R, Protopopescu V (2000) Single iteration training algorithm for multilayer feedforward neural networks. Neural Process Lett 11:113–129
Article Google Scholar
Ito Y, Saito K (1996) Superposition of linearly independent functions and finite mappings by neural networks. Math Sci 21:27–33
MathSciNet MATH Google Scholar
Ganjefar S, Tofighi M (2015) Single-hidden-layer fuzzy recurrent wavelet neural network: applications to function approximation and system identification. Inf Sci 294:269–285
Article MathSciNet Google Scholar
Chen TP, Chen H (1995) Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to a dynamic system. IEEE Trans Neural Netw 6:911–917
Article Google Scholar
Yoshifusa I (1991) Approximation of functions on a compact set by finite sums of sigmoid function without scaling. Neural Netw 4:817–826
Article Google Scholar
Sartori MA, Antsaklis PJ (1991) A simple method to derive bounds on the size and to train multilayer neural networks. IEEE Trans Neural Netw 24:467–471
Article Google Scholar
Hahm N, Hong BI (1999) Extension of localized approximation by neural networks. Bull Austral Math Soc 59:121–131
Article MathSciNet Google Scholar
Chui CK, Li X (1992) Approximation by ridge functions and neural networks with one hidden Iayer. J Approx Theory 70:131–141
Article MathSciNet Google Scholar
Ito Y (1991) Representation of functions by superpositions of a step or sigmoid function and their applications to neural network theory. Neural Netw 4:385–394
Article Google Scholar
Funahashi K (1989) On the approximate realization of continuous mapping by neural network. Neural Netw 2:183–192
Article Google Scholar
Quéau LM, Kimiaei M, Randolph MF (2015) An analysis of global robust stability of delayed dynamical neural networks. Eng Struct 92:172–185
Article Google Scholar
Li FJ (2011) Interpolation and convergence of Bernstein–B$\acute{e}$zier coefficients. Acta Math Sin Engl Ser 27:1769–1782
Article MathSciNet Google Scholar
Cao FL, Lin SB, Xu ZB (2010) Approximation capability of interpolation neural networks. Neurocomputing 74:457–460
Article Google Scholar
Sontag ED (1992) Feedforward nets for interpolation and classification. J Comp Syst Sci 45:20–48
Article MathSciNet Google Scholar
Li X (2002) Interpolation by ridge polynomials and its application in neural networks. J Comput Appl Math 144:197–209
Article MathSciNet Google Scholar
Llanas B, Sainz FJ (2006) Constructive approximate interpolation by neural networks. J Comput Appl Math 188:283–308
Article MathSciNet Google Scholar
Llanas B, Lantarón S (2007) Hermite interpolation by neural networks. Appl Math Comput 191:429–4398
MathSciNet MATH Google Scholar
Li HX, Lee ES (2003) Interpolation functions of Feedforward neural networks. Comput Math Appl 46:1861–1874
Article MathSciNet Google Scholar
Costarelli D (2014) Interpolation by neural network operators activated by ramp functions. J Math Anal Appl 419:574–582
Article MathSciNet Google Scholar
Stinchcombe MB (1999) Neural network approximation of continuous functionals and continuous functions on compactifications. Neural Netw 12:467–477
Article Google Scholar
Konovalov VN, Leviatan D, Maiorov VE (2009) Approximation of Sobolev classes by polynomials and ridge functions. J Approx Theory 159:97–108
Article MathSciNet Google Scholar
Anastassiou GA (2002) Univariate sigmoidal neural network approximation. J Comput Anal Appl 14:659–690
MathSciNet MATH Google Scholar
Anastassiou GA (2011) Multivariate sigmoidal neural network approximation. Neural Netw 24:378–386
Article Google Scholar
Anastassiou GA (2011) Multivariate hyperbolic tangent neural network approximation. Comput Math Appl 61:809–821
Article MathSciNet Google Scholar
Cao F, Chen Z (2009) The approximation operators with sigmoidal functions. Comput Math Appl 58:758–765
Article MathSciNet Google Scholar
Cao F, Chen Z (2012) The construction and approximation of a class of neural networks operators with ramp functions. J Comput Anal Appl 14:101–112
MathSciNet MATH Google Scholar
Cheang GHL (2010) Approximation with neural networks activated by ramp sigmoids. J Approx Theory 162:1450–1465
Article MathSciNet Google Scholar
Pinkus A (1999) Approximation theory of the MLP model in neural networks. Acta Numer 14:143–195
Article MathSciNet Google Scholar
Shristava Y, Dasgupta S (1990) Neural networks for exact matching of functions on a discrete domain. In: Proceedings of the 29th IEEE conference on decision and control. Honolulu, pp 1719–1724
Huang GB, Babri HA (1998) Feedforward neural networks with arbitrary bounded nonlinear activation functions. IEEE Trans Neural Netw 9:224–229
Article Google Scholar
Costarelli D, Spigler R (2015) Approximation by series of sigmoidal functions with application to neural networks. Ann Math Puraed Appl 194:289–306
Article MathSciNet Google Scholar
Anastassiou GA (1997) Rate of convergence of some neural network operators to the unit-univariate case. J Math Anal Appl 212:237–262
Article MathSciNet Google Scholar
Anastassiou GA (2011) Intelligent systems: approximation by artificial neural networks. In: Intelligent systems reference library, vol 19, Springer, Berlin
Cao F, Chen Z (2016) Scattered data approximation by neural networks operators. Neurocomputing 190:237–242
Article Google Scholar
Cao F, Chen Z (2015) The construction and approximation of feedforward neural network with hyperbolic tangent function. Appl Math A J Chin Univ 30:151–162
Article MathSciNet Google Scholar
Cardaliaguet P, Euvrard G (1992) Approximation of a function and its derivative with a neural network. Neurak Netw 5:207–220
Article Google Scholar
Costarelli D, Spigler R (2013) Multivariate neural network operators with sigmoidal activation functions. Neural Netw 48:72–77
Article Google Scholar
Costarelli D, Spigler R (2014) Convergence of a family of neural network operators of the Kantorovich type. J Approx Theory 185:80–90
Article MathSciNet Google Scholar
Tamura S, Tateishi M (1997) Capabilities of a four-layered feedforward neural network. IEEE Trans Neural Netw 8:251–255
Article Google Scholar
Ismailov VE (2012) Approximation by neural networks with weights varying on a finite set of directions. J Math Anal Appl 389:72–83
Article MathSciNet Google Scholar
Nageswara SVR (1999) Simple sample bound for feedforward sigmoid networks with bounded weights. Neuralcomputing 29:115–122
Article Google Scholar
Hahm N, Hong BI (2004) An approximation by neural networks with a fixed weight. Comput Math Appl 47:1897–1903
Article MathSciNet Google Scholar
Xu ZB, Cao FL (2004) The essential order of approximation for neural networks. Sci China 47:97–112
MathSciNet MATH Google Scholar
Xu ZB, Cao FL (2005) Simultaneous Lp-approximation order for neural networks. Neural Netw 18:914–923
Article Google Scholar
Xu ZB, Wang JJ (2006) The essential order of approximation for nearly exponential type neural networks. Sci China 49:446–460
Article MathSciNet Google Scholar
Lin SB, Cao FL, Xu ZB (2011) Essential rate for approximation by spherical neural networks. Neural Netw 24:752–758
Article Google Scholar
Wang JJ, Xu ZB (2010) New study on neural networks: the essential order of approximation. Neural Netw 23:618–624
Article Google Scholar
Li FJ, Xu ZB (2008) The estimation of simultaneous approximation order for neural networks. Chaos, Solitons Fractals 36:572–580
Article MathSciNet Google Scholar
Li FJ, Xu ZB, Zhou YT (2008) The essential order of approximation for Suzuki’s neural networks. Neurocomputing 71:3525–3533
Article Google Scholar
Li FJ, Xu ZB (2007) The essential order of simultaneous approximation for neural networks. Appl Math Comput 194:120–127
MathSciNet MATH Google Scholar
Butzer PL, Nessel RJ (1971) Fourier analysis and approximation. Pure Appl Math. Academic Press, New York
Book Google Scholar
Ditzian Z, Totik V (1987) Moduli of smoothness SSCM. Springer, Berlin
Book Google Scholar
Suzuki S (1998) Constructive function approximation by three layer artificial neural networks. Neural Netw 11:1049–1058
Article Google Scholar
Chui CK, Li X, Mhaskar HN (2006) Limitations of the approximation capabilities of neural networks with one hidden layer. Adv Comput Math 5:233–243
Article MathSciNet Google Scholar
Ismailov VE (2014) On the approximation by neural networks with bounded number of neurons in hidden layers. J Math Anal Appl 417:963–969
Article MathSciNet Google Scholar
Arteaga C, Marrero I (2015) Wiener’s tauberian theorems for the FourierCBessel transformation and uniform approximation by RBF networks of Delsarte translates. J Math Anal Appl 431:482–493
Article MathSciNet Google Scholar
Costarelli D, Spigler R (2013) Approximation results for neural network operators activated by sigmoidal functions. Neural Netw 44:101–106
Article Google Scholar
Yang S, Ting TO, Man KL, Gua SU (2013) Investigation of neural networks for function approximation. Proc Comput Sci 17:586–594
Article Google Scholar
Anastassiou GA (2012) Fractional neural network approximation. Comput Math Appl 64:1655–1676
Article MathSciNet Google Scholar
Bordignon F, Gomide F (2014) Uninorm based evolving neural networks and approximation capabilities. Neurocomputing 127:13–20
Article Google Scholar
Vukovi N, Miljkovi Z (2013) A growing and pruning sequential learning algorithm of hyper basis function neural network for function approximation. Neural Netw 46:210–226
Article Google Scholar
Costarelli D (2015) Neural network operators: Constructive interpolation of multivariate functions. Neural Netw 67:28–36
Article Google Scholar
Anastassiou GA (2011) Univariate hyperbolic tangent neural network approximation. Math Comput Modell 53:1111–1132
Article MathSciNet Google Scholar
Ditzian Z (1989) Best polynomial approximation and Bernstein polynomial approximation on a simplex. Proc Kon Nederl Akad Wetensch 92:243–256
Article MathSciNet Google Scholar
Berens H, Xu Y (1991) K-moduli, moduli of smoothness, and Bernstein polynomials on a simplex. Indag Mathem NS 4:411–421
Article MathSciNet Google Scholar
Phillips GM (2003) Interpolation and approximation by polynomials. Springer, New York
Book Google Scholar

Download references

Acknowledgements

Funded by “Major Innovation Projects for Building First-calss Universities in China’s Western Region” ZKZD2017009.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Ningxia University, Yinchuan, 750021, China
Feng-Jun Li

Authors

Feng-Jun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng-Jun Li.

Ethics declarations

Conflict of interest

I declare that I have no financial and personal relationships with other people or organizations that can inappropriately influence my work; there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, FJ. Constructive function approximation by neural networks with optimized activation functions and fixed weights. Neural Comput & Applic 31, 4613–4628 (2019). https://doi.org/10.1007/s00521-018-3573-3

Download citation

Received: 28 February 2018
Accepted: 30 May 2018
Published: 09 June 2018
Issue Date: September 2019
DOI: https://doi.org/10.1007/s00521-018-3573-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Constructive function approximation by neural networks with optimized activation functions and fixed weights

Abstract

Similar content being viewed by others

The Universal Approximation Property

Function Approximation by Deep Neural Networks with Parameters \(\{0,\pm \frac{1}{2}, \pm 1, 2\}\)

Fitting Small Piece-Wise Linear Neural Network Models to Interpolate Data Sets

Explore related subjects

1 Introduction

2 Description of problems

Problem 2.1

Problem 2.2

Problem 2.3

3 Optimized activation function and constructed FNNs

Remark 3.1

Remark 3.2

Definition 3.1

Remark 3.3

Definition 3.2

Remark 3.4

4 Theoretical results

4.1 Upper bound of approximation

Theorem 4.1

Remark 4.1

Proof

4.2 Lower bound of approximation

Theorem 4.2

Remark 4.2

Lemma 4.1

Lemma 4.2

Lemma 4.3

4.3 Essential order of approximation

Theorem 4.3

Remark 4.3

Remark 4.4

4.4 Interpolation

Theorem 4.4

Remark 4.5

Proof

Remark 4.6

5 Numerical results

Example 5.1

Example 5.2

Example 5.3

6 Conclusions and prospects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation