1 Introduction

Since it is wide and latent learning and expression capabilities, neural networks have been widely used in the real world. It includes almost all fields of natural science and part of social science [1,2,3,4,5,6]. It is well known that the most widely used neural networks is the feed-forward neural networks (FNNs). Many practical problems related to FNNs application, such as in pattern recognition, information processing, engineering technology, computer science, and systems control, can be converted into the ones of learning (or approximating) multivariate functions by the FNNs with optimized activation functions, for which an extensive study on approximation by FNNs has been carried out in a huge topic [7,8,9,10,11,12,13,14].

In recent years, interpolation (approximation with zero error, namely, exact approximation) by FNNs has been a hot spot of research in theory and application of FNNs and its generalization, attracting the attention of scholars all over the world [15,16,17,18,19,20,21,22].

The most widely used and studied neural networks are maybe the FNNs with one hidden layer. The fundamental element of a neural network is known as a “neuron” or a “unit.” Neurons are arranged in layers. A FNN with one hidden layer consists of three layers: input layer, hidden layer and output layer. A sketch map of a FNN is exhibited in Fig. 1.

Fig. 1
figure 1

The architecture of a single hidden layer FNN with n input neurons, m hidden neurons and p output neurons

A three-layer FNN with d input units, m hidden units and one output unit is mathematically represented as the following form

$$\begin{aligned} N({\mathbf{x}}):=\sum _{i=1}^mc_i\sigma \left( \sum _{j=1}^dw_{ij}x_j+\theta _{i}\right) , ~{\mathbf{x}}=(x_1,x_2,\ldots ,x_d)\in R^d,\quad d\ge 1, \end{aligned}$$
(1.1)

where \({\mathbf{w}}_i=(w_{i1},w_{i2},\ldots ,w_{id})^T\in R^d\) are connection weights of the unit i in the hidden layer with the input units, \(c_i\in R\) are the connection strengths of unit i with the output unit, \(\theta _i\in R\) are the thresholds and \(\sigma \) is the activation function. The activation function be usually considered as sigmoid style, namely, it satisfies \( \lim _{x\rightarrow +\infty }\sigma (x)=1 \) and \( \lim _{x\rightarrow -\infty }\sigma (x)=0. \) Equation (1.1) can be further shown in vector pattern as

$$\begin{aligned} N({\mathbf{x}}):=\sum _{i=1}^mc_i\sigma ({\mathbf{w}}_i\cdot {\mathbf{x}}+\theta _i),\quad {\mathbf{x}}\in R^d,\quad d\ge 1. \end{aligned}$$

We will study the following set of functions

$$\begin{aligned} {{\mathbb {N}}}_{n+1}^d(\phi ):=\left\{ N({\mathbf{x}})=\sum _{j=0}^nc_j\phi ({\mathbf{w}}_j\cdot {\mathbf{x}}+b_j), {\mathbf{w}}_j\in R^d, c_j, b_j\in R\right\} , \end{aligned}$$

where \({\mathbf{w}}_j\cdot {\mathbf{x}}\) expresses the ordinary dot product of \(R^d\) and \(\phi \) is a function of R to itself. We call three-layer FNNs with one hidden layer to be the elements of \({{\mathbb {N}}}_{n+1}^d(\phi )\), and also call unit to each summand of \(N({\mathbf{x}})\).

Function approximation by FNNs (1.1) has been extensively studied in the past years with a variety of important results involving density or complexity (see, e.g., [1,2,3,4,5,6,7,8,9,10,11,12,13,14]). The density problem is to determine the requirements under which any function can be approximated by a three-layer FNN with arbitrary precision. The complexity problem is to ascertain the relationship between the smoothness of the approximated function and the lost necessary to attain an approximation with a desired accuracy, which is nearly equivalent to the problem of the metric of approximation [10]. In essence, the problem of density is qualitative research, while the complexity is a quantitative study. Up to now, all kinds of density and complexity outcomes on approximation of the functions by the FNNs (1.1) in the set \({{\mathbb {N}}}_{n+1}^d(\phi )\) were given by using different approaches for more or less general situations ( for instance [10, 23,24,25,26,27,28,29,30] and references therein). However, in previous papers [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, 30,31,32,33,34,35,36,37,38,39,40,41], the weights and thresholds in FNNs vary such that the results are very difficult to be applied in reality.

Let \(S=\{{\mathbf{x}}_0, {\mathbf{x}}_1, \ldots , {\mathbf{x}}_n\}\subset R^d\) be a set of mutually different vectors, \(\{y_i, i=0,1,\ldots ,n\}\) be a set of real numbers and \( ({\mathbf{x}}_0,y_0), ({\mathbf{x}}_1,y_1),\ldots , ({\mathbf{x}}_n,y_n), \) be a group of ordered pairs. We know that the FNNs (1) \(N: R^d\rightarrow R\) is an interpolation of these ordered pairs if \(N({\mathbf{x}}_i)=y_i, i=0, 1, \ldots , n\).

As we know, three-layer FNNs with at most \(n+1\) summands (components of \({{\mathbb {N}}}_{n+1,\varphi }^d\)) can learn \(n+1\) different samples \(({\mathbf{x}}_i, y_i)\) with zero error (exact approximation), and the weights \({\mathbf{w}}_j\) and thresholds \(b_j\) can be selected “almost” arbitrarily. Two main types of proofs of this conclusion have been provided. One is analysis mode, which can be founded in [3, 4, 16, 17, 30, 31]. Another is algebraic form, which has constructive features as given in [4, 6, 18,19,20,21, 31,32,33, 42]. Other direct methods of finding one weight are more difficult and burdensome [4, 18]. From the process of the proof in these references, we can see that it is basically invalid, since almost all the algebraic methods and other direct approaches need to solve a \((n+1)\times (n+1)\) matrix or its inverse matrix. Particularly, when the number of units is large.

2 Description of problems

The problems considered in this paper are as follows: In previous studies ([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, 30,31,32,33,34,35,36,37,38,39,40,41] about the density or complexity of approximation, interpolation), the weights and thresholds in FNNs vary and so the theoretical results are very difficult to be applied in approximate calculation and other aspects. In order to make them easy in application, Ismailov [43] studied function approximation by FNNs with weights varying on a finite set of directions. Nageswara [44] considered learning a function f by using feed-forward sigmoid networks with a single hidden layer and bounded weights. For any continuous function on a compact subset of R, Chui and Li [11] established a density result by FNNs with a sigmoidal function having integer weights and thresholds. Ito [12] proved that the FNNs with sigmoidal functions having unit weights can approximate any continuous function to arbitrary precision on a compact subset of R.

What if the weights and thresholds in FNNs are fixed? Are these kinds of FNNs possible to approximate arbitrary continuous functions in this case? For all we know, this question was first solved by Hahm and Hong [45]. They showed that a FNNs with a sigmoidal activation function and fixed weights can approximate any function to arbitrary precision in \(C_0\) on R. Unfortunately, the findings mentioned above almost are qualitative in feature [45]. Actually, from the application point of view, however, the quantitative study of FNNs approximation is more useful.

The so-called quantitative research is the upper and lower bounds estimations of the neural network approximation ability. (If the upper and lower bounds estimations have the same order, then we call the order of the bounds as the essential order of approximation [46]). In the past ten years, Xu has led the team to create a precedent in the field of this research, and has made a series of important theoretical results, which lays some key foundations for further research on the complexity of the neural network approximation [46,47,48,49,50,51,52,53]. Certainly, these important results are more inclined to theoretical research and are not easy to practice in reality, because the inner universal functions are highly non-smooth or incomputable [47, 52], or infinite differentiability [46, 47, 51, 53], or the number of hidden neurons is exceedingly large [46, 50, 52], or the weights and thresholds in FNNs are variable [51,52,53].

Based on the above systematical analysis, the following problems arise naturally:

Problem 2.1

Can we fix the weights and thresholds in FNNs and provide the quantitative study of their approximation ability to make them easy in theory or application?

Problem 2.2

How the approximation capability of a FNN with the fixed weights and thresholds is related to the topology of the network? Loosely speaking, how many hidden units are required in order for this network to reach a predetermined approximation precision?

Problem 2.3

Is there a way to get the weights and the thresholds of an exact FNN approximation without training? In other words, is there an effective method to find the fixed weights and thresholds in FNN to satisfy the interpolation conditions?

The purpose of this paper is to solve all problems mentioned above by constructing three types of FNNs with optimized activation functions and fixed weights and thresholds and establishing the quantitative approximation theorems. In the following, optimized activation functions and three types of FNNs are defined and then, in Sect. 4, some approximation and interpolation results are obtained. In Sect. 5, applying the theoretical results obtained in this paper, we demonstrate some numerical approximation and interpolation results that show good agreement with theoretical results. Finally, we summarize the paper and foresee problems for the further study.

3 Optimized activation function and constructed FNNs

In this section, we wish to study the activation function usually used in the literature. In fact, a neuron cannot stay excited or inhibited indefinitely. Therefore, in this work, we appoint there exist excitement and inhibition. Based on this hypothesis, we define triangular and trapezoidal units. By using these neurons, we can provide many activation functions, which indicate naturally why the neural network has the ability of uniform approximation. In what follows, we appoint that C[ab] is the set of all continuous functions \(f: [a,b]\rightarrow R\) defined on the bounded interval [ab].

Let \(\sigma : R\rightarrow [0,c]\) be the ramp function defined by

$$\begin{aligned} \sigma (x):= \left\{ \begin{array}{ll} 0, &{}\quad x\le -\,\mu _0,\\ c, &{}\quad x\ge \mu _0,\\ \frac{x+\mu _0}{2\mu _0}c, &{}\quad -\,\mu _0<x<\mu _0, \end{array} \right. c\in R^+,\quad 0<\mu _0\le \frac{1}{2}. \end{aligned}$$
(3.1)

Figure 2 exhibits the ramp function that defined by Eq. (3.1).

Fig. 2
figure 2

Ramp activation function

Remark 3.1

The ramp transfer function defined above is an example of sigmoidal activation function when \(c=1\). If \(c=1\), and \(\mu _0=1/2\), it is the same as in [22].

We are now trying to construct a new function \(\varphi _1\) by using the ramp functions. We define

$$\begin{aligned} \varphi _1(x) &:= \sigma (x+\mu _0)-\sigma (x-\mu _0)\nonumber \\&= {} \left\{ \begin{array}{ll} 0, &{}\quad |x|\ge 2\mu _0,\\ \left( 1-\frac{1}{2\mu _0}|x|\right) c, &{}\quad |x|<2\mu _0, \end{array} \right. c\in R^+,\quad 0<\mu _0\le \frac{1}{2}. \end{aligned}$$
(3.2)

Thus, Eq. (3.2) illustrates the triangle function (see Fig. 3).

Fig. 3
figure 3

Triangle activation function

Figures 2 and 3 exhibit the activation functions with unbounded excitement or unbounded inhibition. In fact, from a biological point of view, excitement and inhibition should not happen abruptly. There should have balance (buffer) zones for both excitement and inhibition. For that reason, we can define a more reasonable and nonnegative activation function as follows:

$$\begin{aligned} \varphi _2(x)&:= \sigma (x+2\mu _0)-\sigma (x-2\mu _0)\nonumber \\&= \left\{ \begin{array}{ll} 0, & \quad |x|\ge 2\mu _0,\\ \left( 1-\frac{1}{\mu _0}|x|\right) c, & \quad \mu _0<|x|<2\mu _0,\\ c, &{}\quad |x|\le \mu _0, \end{array} \right. c\in R^+,\quad 0<\mu _0\le \frac{1}{2}, \end{aligned}$$
(3.3)

which optimizes the ramp activation function and the triangle activation function. We now see that Eq. (3.3) shows a trapezoidal activation function, which is illustrated in Fig. 4.

Fig. 4
figure 4

Trapezoidal activation function

Remark 3.2

The ramp activation functions, triangle activation functions and trapezoidal activation functions are all piecewise linear activation functions.

The triangle functions \(\varphi _1(x)\) and the trapezoidal functions \(\varphi _2(x)\) have the following helpful properties:

\(\hbox {P}_1:\) :

Both \(\varphi _1(x)\) and \(\varphi _2(x)\) are even functions;

\(\hbox {P}_2:\) :

Both \(\varphi _1(x)\) and \(\varphi _2(x)\) are non-decreasing for \(x<0\) and non-increasing for \(x>0\);

\(\hbox {P}_3:\) :

\({\mathrm{Supp}}(\varphi _j)\subseteq [-\,c,c],~j=1, 2\).

We now consider the uniform space nodes \(x_k=a+kh, k=0,1,\ldots ,n\), on the interval [ad], where \(h=\frac{d-a}{n}=\frac{2(b-a)}{n}, b=\frac{d+a}{n}\).

Now, we are able to construct three types of FNNs based on piecewise line activation functions \(\varphi _1(x)\) and \(\varphi _2(x)\) above defined.

Definition 3.1

If \(n\in N^+\) and \(f: [a,d]\rightarrow R\) is a bounded and measurable function, then we construct the FNNs with optimized piecewise line activation functions \(\varphi _1(x)\) and \(\varphi _2(x)\) as follows:

$$\begin{aligned} N_{n, j}(f, x):=\frac{\sum _{k=0}^nf(x_k)\varphi _j\left( \frac{n}{2(b-a)}x-\frac{n}{2(b-a)}x_k\right) }{\sum _{k=0}^n\varphi _j\left( \frac{n}{2(b-a)}x-\frac{n}{2(b-a)}x_k\right) },\quad x\in [a,d], \quad j=1, 2. \end{aligned}$$
(3.4)

Remark 3.3

Actually, in [22], such similar neural network, are called “interpolation neural network operators” and have been introduced and studied in case of \(\varphi _1(x)\) only with parameter \(c=1\), i.e., when \(\sigma \) is a sigmoidal function, and with the parameter \(\mu _0=1/2\), the step \(h=\frac{b-a}{n}\).

Certainly, the FNNs can be established by using other types of sigmoidal functions as activation function. Denoted by

$$\begin{aligned} M_s(x):=\frac{1}{(s-1)!}\sum _{i=0}^s(-\,1)^i{s\atopwithdelims ()i}(\mu _0s+x-i)^{s-1}_{+},\quad x\in R, \end{aligned}$$

the well-known B-splines of order \(s\in N^+\) [54]. Here and hereafter, the function \((x)_+:=\max \{x, 0\}\) represents the positive past of x. The functions \(M_s\) have compact support with \({\mathrm{supp}}(M_s)\subseteq [-\,s/2, s/2]\) for arbitrary \(s\in N^+\). Only if \(\mu _0=1/2\), the \(M_s(x)\) are the well-known central B-splines. We recall the definition of other kinds of sigmoidal functions \(\sigma _{M_s}(x)\), firstly introduced in [34].

$$\begin{aligned} \sigma _{M_s}(x):=\int _{-\infty }^{x}M_s(t){\mathrm{d}}t,\quad x\in R. \end{aligned}$$

We can easily find that \(\sigma _{M_1}(x)\) accords exactly with the piecewise linear function \(\sigma (x)\). Now, we can define the following nonnegative activation functions:

$$\begin{aligned} \varphi _s(x):= \sigma _{M_s}(x+\mu _0)-\sigma _{M_s}(x-\mu _0),\quad x\in R, \end{aligned}$$
(3.5)

for any \(s\in N^+\). Similarly to the case of the piecewise linear functions \(\varphi _1\) and \(\varphi _2\), the function \(\varphi _s(x)\) possesses the following properties:

\(\hbox {P}_4\)::

\(\varphi _s(x)\) is an even function;

\(\hbox {P}_5\)::

\(\varphi _s(x)\) is non-decreasing for \(x<0\) and non-increasing for \(x\ge 0\);

\(\hbox {P}_6\)::

\({\mathrm{supp}}(\varphi _s)\subseteq [-\,K_s,K_s]:=[-\,\mu _0(s+1),\mu _0(s+1)]\), and \(\varphi _s\left( \frac{K_s}{2}\right) =\varphi _s\left( \frac{\mu _0(s+1)}{2}\right) >0\).

Definition 3.2

For any bounded and measurable function \(f: [a,d]\rightarrow R\), the approximator: FNNs with activation function \(\varphi _s\) are defined by

$$\begin{aligned} N_{n, s}(f, x):=\frac{\sum _{k=0}^nf(x_k)\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) }{\sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) },\quad x\in [a,d]. \end{aligned}$$
(3.6)

Remark 3.4

In fact, when \(s=1\), the FNNs defined in (3.6) degenerate to those recalled in (3.4). Also the neural networks \(N_{n,s}(f,x)\) have been originally defined and studied in [22] in case of \(\mu _0=1/2\), and the step \(h=\frac{b-a}{n}\).

4 Theoretical results

In the current note, the following quantitative approximation results for the family of FNNs \(N_{n, j},~j=1, 2\) and \(N_{n, s},~s\in N^+\) can be proved. For every \(r\in N^{+}\), \(\delta >0\), and any function \(f\in C[a,d]\), we now recall the well-known definition of rth order modulus of smoothness, and the Lipschitzian function class of f as follows [55]:

$$\begin{aligned} \omega _r(f,\delta ):=\sup _{a\le x,x+rt\le b, |t|\le \delta }|\Delta _t^rf(x)|, \end{aligned}$$

where \(\Delta _t^rf(x):=\Delta _t^1\Delta _t^{r-1}f(x)\), and \(\Delta _t^1f(x):=f(x+t)-f(x)\). Note that, when \(r=1\), we have \(\omega _1(f,\delta )=\omega (f,\delta )\). Namely, the first order modulus of smoothness of f is the same as modulus of continuity of f.

$$\begin{aligned} \text {Lip}(f, \alpha )_2:=\left\{ f~|~\omega _2(f,\delta )\le M\delta ^{\alpha },~ \alpha \in (0,2],\; M\;{\mathrm{is}}\;{\mathrm{a}}\;{\mathrm{positive}}\;{\mathrm{constant}}\right\} . \end{aligned}$$

The rth order modulus of smoothness possesses the following helpful properties:

  1. 1.

    \(\omega _r(f,\delta )\) is a monotonically increasing continuous function about \(\delta \), and \(\omega _r(f,0)=0\) ;

  2. 2.

    If \(0\le s<r,\) then \(\omega _r(f,\delta )\le 2^{r-s}\omega _s(f,\delta )\);

  3. 3.

    If \(0<\delta <\eta ,\) then \(0<\omega _r(f,\eta )-\omega _r(f,\delta )\le 2^{r}r\omega (f,\eta -\delta )\), and \(\eta ^{-r}\omega _r(f,\eta )\le 2^{r}\delta ^{-r}\omega _r(f,\eta -\delta )\);

  4. 4.

    \(\omega _r(f,p\delta )\le p^{r}\omega _r(f,\delta )\) for any \(p\in N^{+}\);

  5. 5.

    \(\omega _r(f,q\delta )\le \omega _r(f,[q+1]\delta )\le (q+1)^{r}\omega _s(f,\delta )\) for arbitrary non-natural number \(q>0\);

  6. 6.

    If f has rth order continuous derivatives, then \(\omega _r(f,\delta )\le \delta ^{r}||f^{(r)}||_{C}\), and \(\omega _{r+s}(f,\delta )\le \delta ^{r}\omega _s(f^{(r)},\delta )\).

4.1 Upper bound of approximation

For any continuous functions defined on [ad], the following upper bound estimations theorem about the quantitative research on the approximation of FNNs can be proved.

we now give the main results of this subsection.

Theorem 4.1

Let \(f\in C[a,d]\) be fixed. Then

$$\begin{aligned}&||N_{n, 1}(f, x)-f(x)||_{\infty }\le 4\omega _2\left( f, \frac{b-a}{n}\right) ,\quad \forall ~ n\in N^+, \end{aligned}$$
(4.1)
$$\begin{aligned}&||N_{n, 2}(f, x)-f(x)||_{\infty }\le 4\omega _2\left( f, \frac{b-a}{n}\right) ,\quad \forall ~ n\in N^+, \end{aligned}$$
(4.2)
$$\begin{aligned}&||N_{n, s}(f, x)-f(x)||_{\infty }\le \frac{2}{\varphi _s(\frac{K_s}{2})}\omega _2\left( f, \frac{b-a}{n}\right) ,\quad \forall ~n, s\in N^+. \end{aligned}$$
(4.3)

Remark 4.1

The Theorem 4.1 is the positive theorem of approximation of the three kinds of FNNs. These approximation upper bounds are inspired to the results originally in [22], in case of \(\varphi _1(x)\), with \(c=1\), \(\mu _0=1/2\), and the step \(h=\frac{b-a}{n}\), and for \(\varphi _2(x)\) with \(\mu _0=1/2\), the step \(h=\frac{b-a}{n}\). Moreover, Eqs. (4.1) and (4.3) obviously deepen the results proved in [22] ( For instance, if \(g(x)=x^n, n\in N\) is a polynomial, then \(\omega (g, t)=\bigcirc (t)\), while \(\omega _2(g, t)=\bigcirc (t^2)\).), and Eq. (4.2) represents a completely new result.

Proof

In fact, we find that for each \(x\in [a,d]\), by using P\(_1\), and then, we obtain

$$\begin{aligned} \sum _{k=0}^n\varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) = \sum _{k=0}^n\varphi _j\left( \frac{n|x-x_k|}{2(b-a)}\right) \ge \varphi _j\left( \frac{n|x-x_i|}{2(b-a)}\right) ,\quad j=1, 2, \end{aligned}$$
(4.4)

where \(i\in \{0,1,\ldots ,n\}\) satisfies \(|x-x_i|\le \mu _0h\). Thus,

$$\begin{aligned} \frac{n|x-x_i|}{2(b-a)}\le \frac{n\mu _0h}{2(b-a)}=\mu _0, \end{aligned}$$
(4.5)

By Eqs. (4.4), (4.5) and P\(_2\), we have

$$\begin{aligned} \sum _{k=0}^n\varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) \ge \varphi _j\left( \frac{n|x-x_i|}{2(b-a)}\right) \ge \varphi _j(\mu _0)=\left\{ \begin{array}{ll} \frac{1}{2}c, &{}\quad j=1,\\ c, &{}\quad j=2. \end{array} \right. \end{aligned}$$
(4.6)

In addition, for any bounded and measurable function \(f: [a,d]\rightarrow R\), we obtain

$$\begin{aligned} |N_{n, j}(f, x)|\le ||f||_{\infty }\frac{\sum _{k=0}^n\varphi _j \left( \frac{n(x-x_k)}{2(b-a)}\right) }{\sum _{k=0}^n \varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) } =||f||_{\infty }<+\infty ,\quad j=1, 2, \end{aligned}$$

for each \(x\in [a,d]\), where \(||f||_{\infty }:= \sup _{x\in [a,d]}|f(x)|\).

For each \(x\in [a,d]\) and by (4.6), we know that

$$\begin{aligned} \big |N_{n, 1}(f, x)-f(x)\big |= & {} \frac{\left| \sum _{k=0}^nf(x_k)\varphi _1 \left( \frac{n(x-x_k)}{2(b-a)}\right) -f(x)\sum _{k=0}^n\varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) \right| }{\sum _{k=0}^n\varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) }\\\le & {} \frac{2}{c}\left| \sum _{k=0}^nf(x_k)\varphi _1 \left( \frac{n(x-x_k)}{2(b-a)}\right) -f(x)\sum _{k=0}^n\varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) \right| \\\le & {} \frac{2}{c}\sum _{k=0}^n\left| f(x_k)-f(x)\right| \varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) , \end{aligned}$$

for every fixed \(n\in N^+\). We now choose \(i\in \{0,1,\ldots ,n-1\}\) such that \(x_i\le x\le x_{i+1}\), then

$$\begin{aligned} \left| N_{n, 1}(f,x)-f(x)\right|\le & {} \frac{2}{c}\left[ \sum _{k=0,k\ne i,i+1}^n|f(x_k)-f(x)|\varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) \right. \\&\left. +\,|f(x_i)-f(x)|\varphi _1\left( \frac{n(x-x_i)}{2(b-a)}\right) \right. \\&\left. +\,|f(x_{i+1})-f(x)|\varphi _1\left( \frac{n(x-x_{i+1})}{2(b-a)}\right) \right] \\=: & {} \frac{2}{c}[I_1+I_2+I_3]. \end{aligned}$$

Obviously, for \(k\ne i, i+1\), we obtain \(\frac{n|x-x_k|}{2(b-a)}\ge \frac{nh}{2(b-a)}=1\), then by the properties P\(_1\), P\(_2\) and P\(_3\), we have

$$\begin{aligned} 0\le \varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) =\varphi _1 \left( \frac{n|x-x_k|}{2(b-a)}\right) \le \varphi _1\left( \frac{nh}{2(b-a)}\right) =\varphi _1(1)=0, \end{aligned}$$

which implies, \(I_1=0\). Since \(|x_i-x|\le h \) and \(|x_{i+1}-x|\le h\), we can find

$$\begin{aligned} |f(x_i)-f(x)| \le \omega _2\big (f,\frac{b-a}{n}\big ). \end{aligned}$$

Similarly,

$$\begin{aligned} |f(x_{i+1})-f(x)|\le \omega _2\left( f,\frac{b-a}{n}\right) . \end{aligned}$$

Finally, we obtain

$$\begin{aligned} I_1+I_2+I_3=I_2+I_3\le 2c\omega _2\left( f,\frac{b-a}{n}\right) . \end{aligned}$$

With this, the proof of (4.1) in Theorem 4.1 is completed. Applying the same method used in proof of (4.1) in Theorem 4.1, we can prove (4.2) in Theorem 4.1. We omit the details. Next, we will prove (4.3) in Theorem 4.1.

Employing the technique similar to that adopted in Eqs. (4.4)–(4.6), it is easily to prove that the above FNNs (3.6) are well defined for arbitrary \(n\in N^+\) and

$$\begin{aligned} \sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \ge \varphi _s\left( \frac{K_s}{2}\right) >0. \end{aligned}$$

Now, for each \(x\in [a, d]\), and \(f\in C[a,d]\), there exists \(i\in \{0, 1,2,\ldots ,n-1\}\) such that \(x_i\le x\le x_{i+1}\), and then we have

$$\begin{aligned} \left| N_{n, s}(f, x)-f(x)\right|= & {} \frac{\left| \sum _{k=0}^nf(x_k)\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) -f(x)\sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \right| }{\sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) }\\\le & {} \frac{1}{\varphi _s(K_s/2)} \left| \sum _{k=0}^nf(x_k)\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \right. \\&\left. -\,f(x)\sum _{k=0}^n\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \right| \\\le & {} \frac{1}{\varphi _s(K_s/2)} \sum _{k=0}^n|f(x_k)-f(x)|\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \\= & {} \frac{1}{\varphi _s(K_s/2)}\left[ \sum _{k=0,k\ne i,i+1}^n|f(x_k)-f(x)|\varphi _s\left( K_s\frac{n(x-x_k)}{2(b-a)}\right) \right. \\&\left. +\,|f(x_i)-f(x)|\varphi _s\left( K_s\frac{n(x-x_i)}{2(b-a)}\right) \right. \\&\left. +\,|f(x_{i+1})-f(x)|\varphi _s\left( K_s\frac{n(x-x_{i+1})}{2(b-a)}\right) \right] \\=: & {} \frac{1}{\varphi _s(K_s/2)}[J_1+J_2+J_3]. \end{aligned}$$

The addends \(J_1, J_2\) and \(J_3\) can be handled as made in the proof of (4.1) in Theorem 4.1, and then, (4.3) in Theorem 1 follows immediately. This completes the proof of the Theorem 4.1.

4.2 Lower bound of approximation

Until now, many results about density and upper bound estimations on approximation of the functions by the FNNs (1.1) in the set \({{\mathbb {N}}}_{n+1}^d(\phi )\) were given by many researchers [2, 9, 23, 24, 26,27,28,29,30,31,32,33, 54,55,56,57,58,59,60,61,62,63,64,65]. In fact, because the established upper bound estimations can only control one side of the approximation error, the estimation results might be too loose to perfectly reflect the approximation capability of the FNNs. Naturally, in order to characterize the approximation ability of FNNs more precisely, besides upper bound estimation, a lower bound estimation that reflects the worst approximation precision of the network is still a question that is worth studying. Therefore, we emphasize that the results of this subsection are completely new.

We now give the main results of this subsection.

Theorem 4.2

Let\(f\in C[a,d]\)be fixed. Then

$$\begin{aligned}&\omega _2\left( f,\frac{b-a}{n}\right) \le \frac{C}{n}\sum _{i=1}^n||N_{i, j}(f, x)-f(x)||_{\infty },\quad j=1,2,~n\in N^+, \end{aligned}$$
(4.7)
$$\begin{aligned}&\omega _2\left( f,\frac{b-a}{n}\right) \le \frac{C}{n}\sum _{i=1}^n||N_{i, s}(f, x)-f(x)||_{\infty },\quad \forall ~ s\in N^+. \end{aligned}$$
(4.8)

Here and hereafter, C that appears in different situations may be different but all are positive constants independent of n and f.

Remark 4.2

We emphasize that the results of this subsection are completely new. The Theorem 4.2 is the converse theorem of approximation of the three kinds of FNNs. These conclusions reveal three lower bound estimations on approximation precision of these FNNs, which means that the average of these FNNs over the number of hidden neurons is lower controlled by the second order modulus of smoothness of function f.

In order to prove Theorem 4.2, we will use the famous Bernstein polynomial and the extended Bernstein polynomial as two basic tools. Let \(f\in C[0,1]\), the sequence of Bernstein polynomials for f(x) is defined by [66]

$$\begin{aligned} B_n(f,x):=\sum _{k=0}^nf\left( \frac{k}{n}\right) {n\atopwithdelims ()k}x^k(1-x)^{n-k},\quad x\in [0,1]. \end{aligned}$$

Similarly, if \(f\in C[a,d]\), then we define the extended Bernstein polynomials for f(x) as follows

$$\begin{aligned} {\mathrm{EB}}_n(f,x):=\sum _{k=0}^nf\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}\left( \frac{x-a}{d-a}\right) ^k\left( 1-\frac{x-a}{d-a}\right) ^{n-k},\quad x\in [a,d]. \end{aligned}$$

For nearly half a century, the Bernstein polynomial and its improvement have attracted much interest, and a great number of interesting results to the classical Bernstein polynomial have been obtained [66,67,68,69].

The following fundamental result on classical Bernstein polynomial is well known [55]. In order to facilitate readers, we give the lemmas as follows:

Lemma 4.1

([55]) Let \(f\in C[0,1]\). Then there is positive constantCsuch that

$$\begin{aligned} w_2\left( f;\frac{1}{n}\right) \le \frac{C}{n}\sum _{k=1}^n||B_k(f,x)-f(x)||_{\infty }. \end{aligned}$$

According to the Lemma 4.1, we can easily get the following Lemma 4.2.

Lemma 4.2

If\(f\in C[a,d]\). Then there is positive constantCsuch that

$$\begin{aligned} w_2\left( f;\frac{b-a}{n}\right) \le w_2\left( f;\frac{d-a}{n}\right) \le \frac{C}{n}\sum _{k=1}^n||{\mathrm{EB}}_k(f,x)-f(x)||_{\infty }. \end{aligned}$$

Lemma 4.3

([45]) Let\(f\in C[a, d]\). If\(\sigma \)is a bounded measurable sigmoidal function onR. Then, for any\(\epsilon >0\), there is a neural network\(N_n(x)\)of the form (1.1), such that

$$\begin{aligned} |N_n(x)-f(x)|<\epsilon , \end{aligned}$$

where \(N(x)=\sum _{i=1}^nc_i\sigma (w_{i}x+\theta ), ~c_i, w_i, \theta \in R\).

We now are to prove Theorem 4.2. First, we demand an equivalent description of the extended Bernstein operator.

$$\begin{aligned} {\mathrm{EB}}_n(f,x)= & {} \sum _{k=0}^nf\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}\left( \frac{x-a}{d-a}\right) ^k\left( 1-\frac{x-a}{d-a}\right) ^{n-k}\nonumber \\= & {} \sum _{k=0}^nf\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}\left( \frac{x-a}{d-a}\right) ^k\nonumber \\&\quad \times&\left( 1+C_{n-k}^1\left( -\frac{x-a}{d-a}\right) +\cdots +C_{n-k}^i \left( -\frac{x-a}{d-a}\right) ^i+\cdots +\left( -\frac{x-a}{d-a}\right) ^{n-k}\right) \nonumber \\= & {} \sum _{k=0}^nf\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}\sum _{i=0}^{n-k}(-\,1)^i\frac{(n-k)!}{i!(n-k-i)!}\left( \frac{x-a}{d-a}\right) ^{i+k}\nonumber \\= & {} \sum _{k=0}^n\sum _{i=0}^{n-k}f\left( a+\frac{k}{n}(d-a)\right) {n\atopwithdelims ()k}(-\,1)^i\frac{(n-k)!}{i!(n-k-i)!}\left( \frac{x-a}{d-a}\right) ^{i+k}\nonumber \\= & {} \sum _{k=0}^n\sum _{i=0}^{n-k}d_{i,k}\left( \frac{x-a}{d-a}\right) ^{i+k},\quad x\in [a,d], \end{aligned}$$
(4.9)

where \(d_{i,k}=(-1)^if (a+\frac{k}{n}(d-a)){n\atopwithdelims ()k}\frac{(n-k)!}{i!(n-k-i)!}\).

Second, let r be a fixed integer and \(P_r(x)=a_rx^r, x\in [a, d]\) be a univariate polynomial of degree r. By Lemma 4.3, we know that there is a FNN of the form (1.1) the number of whose hidden units is not less than \((r+1)\) such that

$$\begin{aligned} |N_n(x)-P_r(x)|<\epsilon . \end{aligned}$$

We find that in (4.9) each term is a univariate polynomial of x with order \(i+k~ (i+k\le n)\); therefore, it can be approximated arbitrarily well by a FNN of the form

$$\begin{aligned} N_{i+k+1}(x)=\sum _{l=1}^{K_{i+k}}c_{l,i+k}\sigma (w_{l,i+k}x+\theta ),\quad K_{i+k+1}\ge i+k+1. \end{aligned}$$
(4.10)

Because \(B_n(f,x)\) and \({\mathrm{EB}}_n(f,x)\) can approximate f, the following FNNs

$$\begin{aligned} \sum _{k=0}^n\sum _{i=0}^{n-k}d_{i,k}\sum _{l=1}^{K_{i+k}} c_{l,i+k}\sigma (w_{l,i+k}x+\theta ),\quad c_{l,i+k},w_{l,i+k}\in R,\quad K_{i+k}\ge i+k+1 \end{aligned}$$

can approximate f to any accuracy. In consequence, the networks (4.10) will be the FNN models we propose to use in this subsection.

At the third stage, according to Lemma 4.3, the polynomial \(x^{i+k} (i+k\le n)\) can be approximated by a network having the following form

$$\begin{aligned} N_{K_{i+k}}=\sum _{l=1}^{K_{i+k}}c_{l,i+k} \sigma (w_{l,i+k}x+\theta ),\quad c_{l,i+k},w_{l,i+k}\in R,\quad K_{i+k}\ge i+k+1, \end{aligned}$$
(4.11)

with approximation error

$$\begin{aligned} |N_{K_{i+k}}-x^{i+k}|<\epsilon . \end{aligned}$$
(4.12)

Equations (4.11) and (4.12) imply

$$\begin{aligned} ||N_m(f,x)-{\mathrm{EB}}_m(f,x)||_{\infty }= & {} \left| \left| \sum _{k=0}^m\sum _{i=0}^{m-k}d_{i,k}\left\{ x^{i+k}-\sum _{l=1}^{K_{i+k}} c_{l,i+k}\sigma (\omega _{l,i+k}x+\theta )\right\} \right| \right| _{\infty }\\= & {} \sum _{k=0}^m\sum _{i=0}^{m-k}|d_{i,k}|\max _{x\in [a,b]} \left| x^{i+k}-N_{K_{i+k}}\right| \\\le & {} \epsilon \sum _{k=0}^m\sum _{i=0}^{m-k}|d_{i,k}|. \end{aligned}$$

Next, taking \(f(x_k)=\sum _{i=0}^{m-k}\sum _{l=1}^{K_{i+k}}d_{i,k}c_{l,i+k}\), \(\varphi _j=\sigma \), \(\frac{n}{2(b-a)}=\omega _{l,i+k}\), and \(\theta =\frac{n}{2(b-a)}x_k\) in (3.4), and by (4.6), we have

$$\begin{aligned} \left| \left| N_{m, j}(f, x)-{\mathrm{EB}}_m(f,x)\right| \right| _{\infty }= & {} \left| \left| \frac{\sum _{k=0}^mf(x_k) \varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) }{\sum _{k=0}^m\varphi _j\left( \frac{n(x-x_k)}{2(b-a)}\right) }-{\mathrm{EB}}_mf(x) \right| \right| _{\infty }\\\le & {} \frac{2}{c}\left| \left| \sum _{k=0}^mf(x_k)\varphi _j \left( \frac{n(x-x_k)}{2(b-a)}\right) -{\mathrm{EB}}_m(f,x)\right| \right| _{\infty }\\\le & {} C\left| \left| \sum _{k=0}^mf(x_k)\varphi _j \left( \frac{n(x-x_k)}{2(b-a)}\right) -{\mathrm{EB}}_m(f,x)\right| \right| _{\infty }\\= & {} C\left| \left| \sum _{k=0}^m\sum _{i=0}^{m-k}d_{i,k}\sum _{l=1}^{K_{i+k}} c_{l,i+k}\sigma (w_{l,i+k}x+\theta )-{\mathrm{EB}}_m(f,x)\right| \right| _{\infty }\\= & {} C||N_m(f, x)-{\mathrm{EB}}_m(f, x)||_{\infty }\\\le & {} C\epsilon \sum _{k=0}^m\sum _{i=0}^{m-k}|d_{i,k}|. \end{aligned}$$

Finally, for the constructed FNN

$$\begin{aligned} \sum _{k=0}^n\sum _{i=0}^{n-k}d_{i,k}\sum _{l=1}^{K_{i+k}} c_{l,i+k}\sigma (w_{l,i+k}x+\theta )~c_{l,i+k},w_{l,i+k}\in R,\quad K_{i+k}\ge i+k+1, \end{aligned}$$

we obtain the lower bound estimation of \(||N_{n,j}-f||_{\infty }, ~j=1,2\) as follows:

$$\begin{aligned} \omega _2\left( f,\frac{b-a}{n}\right)\le & {} \frac{C}{n}\sum _{k=1}^n||{\mathrm{EB}}_k(f, x)-f(x)||_{\infty }\\\le & {} \frac{C}{n}\sum _{k=1}^n\left\{ ||{\mathrm{EB}}_k(f, x)-N_{k,j}(f, x)||_{\infty }+||N_{k,j}(f, x)-f(x)||_{\infty }\right\} \\\le & {} \frac{C}{n}\sum _{k=1}^n||N_{k,j}(f, x)-f(x)||_{\infty }+\frac{C\epsilon }{n}\sum _{k=0}^n \sum _{i=0}^{n-k}\sum _{l=1}^{K_{i+k}}|d_{i,k}|. \end{aligned}$$

Letting \(\epsilon \) tend to zero, it then follows that

$$\begin{aligned} \omega _2\left( f,\frac{b-a}{n}\right) \le \frac{C}{n}\sum _{k=1}^n||N_{k,j}(f, x)-f(x)||_{\infty }, j=1,2. \end{aligned}$$

Thus the first part of the Theorem 4.2 is proved. The second part of the Theorem 4.2 follows immediately by the same arguments used in the proof process of the first part of the Theorem 4.2. In order not to repeat, we omit the details.

4.3 Essential order of approximation

If the arithmetic means on the right sides of in Eqs. (4.1), (4.2), (4.7) and (4.3), (4.8) can be substituted by \(||N_{n,j}(f, \cdot )-f(\cdot )||_{\infty }, j=1,2\) and \(||N_{n,s}(f, \cdot )-f(\cdot )||_{\infty },\) respectively, then by Theorems 4.1 and 4.2, we obtain that the upper and lower bound estimations of approximation by the FNNs, \(N_{n,j}(f), j=1,2\) and \(F_{n,s}(f)\), become identical as the second order modulus of smoothness of approximated function f. Namely,

$$\begin{aligned} \omega _2\left( f,\frac{b-a}{n}\right) \sim ||N_{n,j}(f, \cdot )-f(\cdot )||_{\infty },\quad j=1,2, \end{aligned}$$

and

$$\begin{aligned} \omega _2\left( f,\frac{b-a}{n}\right) \sim ||N_{n,s}(f, \cdot )-f(\cdot )||_{\infty },\quad s\in N^{+}. \end{aligned}$$

Unfortunately, up to now, we cannot answer these problems for arbitrary function classes. Happily, we can solve them when the approximated function f belonging to the class of second order Lipschitz \(\alpha (0<\alpha \le 2)\). From this point of view, the following Theorem 4.3 can be drawn directly by combining the Theorem 4.1 with the Theorem 4.2.

Theorem 4.3

Let \(f\in C[a,d]\) be fixed. Then

$$\begin{aligned}&||N_{n, j}(f, x)-f(x)||_{\infty }=\bigcirc (n^{-\alpha })\Leftrightarrow f\in {\mathrm{Lip}}(\alpha )_2,\quad {\mathrm{for}}\;\; j=1, 2.\\&||N_{n, s}(f, x)-f(x)||_{\infty }=\bigcirc (n^{-\alpha })\Leftrightarrow f\in {\mathrm{Lip}}(\alpha )_2,\quad {\mathrm{for}}\;\; s\in N^+. \end{aligned}$$

Remark 4.3

We emphasize that the results of this subsection are completely new. The Theorem 4.3 points out that the inherent approximation order of these three kinds of FNNs is \(O(n^{-\alpha })\). Thus, the approximation capability of the three kinds of FNNs is thoroughly decided by the smoothness of approximated functions. That is to say, the better the properties of the approximated function is, the higher the precision of approximation is. But the maximal precision of approximation cannot outperform \(O(n^{-\alpha })\).

Remark 4.4

Theorems 4.1, 4.2 and 4.3 are three affirmative answers to the Problems 2.1 and 2.2. The connection weights and the thresholds are, respectively, equal to \(\frac{n}{2(b-a)}\) and \(\frac{-nx_k}{2(b-a)}\) in the FNNs \(N_{n, 1}\) and \(N_{n, 2}\). Moreover, the connection weights and the thresholds in the FNNs \(N_{n, s}\) are equal to \(K_s\frac{n}{2(b-a)}=\mu _0(s+1)\frac{n}{2(b-a)}\) and \(-K_s\frac{nx_k}{2(b-a)}=-\mu _0(s+1)\frac{nx_k}{2(b-a)}\). Because \(a, b\in R, ~\mu _0\in R^+\) are all constants, \(s\in N^+\) is the positive integer, and \(n\in N^+\) is the number of hidden neurons, we can find the connection weights and the thresholds in the FNNs to satisfy the approximation conditions and do not need to train. It gives the quantitative researches on approximation precision of these FNNs and characterizes the implicit relationship among the precision of approximation, the number of hidden neurons and the smoothness of the approximated function.

4.4 Interpolation

In the current note, the following interpolation (exact approximation) results for the family of FNNs \(N_{n, j},~j=1, 2\) and \(N_{n, s},~s\in N^+\) can be proved.

Theorem 4.4

If\(f: [a,d]\rightarrow R\)be a bounded and measurable function and\(n\in N^+\), then

$$\begin{aligned} N_{n, 1}(f, x_i)& = {} f(x_i),\quad {\mathrm{for}}\;{\mathrm{every}}\; i=0,1,\ldots ,n, \end{aligned}$$
(4.13)
$$\begin{aligned} N_{n, 2}(f, x_i)& = {} f(x_i),\quad {\mathrm{for}}\;{\mathrm{every}}\; i=0,1,\ldots ,n, \end{aligned}$$
(4.14)
$$\begin{aligned} N_{n, s}(f, x_i)& = {} f(x_i),\quad {\mathrm{for}}\;{\mathrm{every}}\; i=0,1,\ldots , n,\,and\,s\in N^+. \end{aligned}$$
(4.15)

Remark 4.5

The results in Theorem 4.4 are inspired to the results originally in [22], in case of \(\varphi _1(x)\), with \(c=1\), \(\mu _0=1/2\), and the step \(h=\frac{b-a}{n}\), and for \(\varphi _2(x)\) with \(\mu _0=1/2\), and the step \(h=\frac{b-a}{n}\). Moreover, Eqs. (4.13) and (4.15) represent a slight extension of the results proved in [22], and Eq. (4.14) represent a completely new result.

Proof

Let \(i\in \{0,1,\ldots ,n\}\) be fixed. If \(k=i\), then we obtain

$$\begin{aligned} \varphi _j\left( \frac{n(x_i-x_k)}{2(b-a)}\right) =\varphi _j(0)=c,\quad j=1,2. \end{aligned}$$

While, if \(k\ne i\), then we have

$$\begin{aligned} \frac{n|x_i-x_k|}{2(b-a)}\ge \frac{nh}{2(b-a)}=1. \end{aligned}$$

Thus, by using \(0<\mu _0\le \frac{1}{2}\) and the properties P\(_1\) and P\(_2\), we obtain that

$$\begin{aligned} 0=\varphi _j(2\mu _0)=\varphi _j(1)\ge \varphi _j \left( \frac{n|x_i-x_k|}{2(b-a)}\right) =\varphi _j \left( \frac{n(x_i-x_k)}{2(b-a)}\right) \ge 0,\quad j=1, 2. \end{aligned}$$

Therefore, we get

$$\begin{aligned} \varphi _j\left( \frac{n(x_i-x_k)}{2(b-a)}\right) = \left\{ \begin{array}{ll} c, &{}\quad i=k,\\ 0, &{}\quad i\ne k,\\ \end{array} \right. \quad j=1, 2 \end{aligned}$$

for each \(i, k=0,1,\ldots ,n\), and this implies that

$$\begin{aligned} N_{n, j}(f, x_i)=\frac{f(x_i)\varphi _j\left( \frac{n(x_i-x_k)}{2(b-a)}\right) }{\varphi _j\left( \frac{n(x_i-x_k)}{2(b-a)}\right) }=f(x_i),\quad j=1, 2, \end{aligned}$$

for any \(i\in \{0,1,\ldots ,\}\). Thus the first part of the Theorem 4.4 is proved.

Next, we will prove the second part of the Theorem 4.4. If \(i\ne k\), \(i,k=0,1,\ldots ,n\), then we have \(|x_i-x_k|\ge h\), and \(K_s\frac{n|x-x_k|}{2(b-a)}\ge K_s\). By the properties of \(\varphi _s\) it turns out that

$$\begin{aligned} \varphi _s\left( K_s\frac{n(x_i-x_k)}{2(b-a)}\right) = \varphi _s\left( K_s\frac{n|x_i-x_k)|}{2(b-a)}\right) =0. \end{aligned}$$

Thus the second part of the Theorem 4.4 follows immediately by the same arguments used in the first part of the Theorem 4.4. For the sake of brevity, we omit the details.

Remark 4.6

The Theorem 4.4 shows the interpolation results of these FNNs with the fixed weights and the thresholds, which is the sublimation of the Theorems 4.1 and 4.2 and answers the Problem 2.3 successfully.

5 Numerical results

Because continuous functions on bounded interval are normally considered as target functions in engineering and other applications [1, 4,5,6, 14, 32], we now only focus on a numerical approximation to a continuous function on bounded interval. In Theorems 4.1, 4.2, 4.3 and 4.4, we show that any continuous function on bounded interval [ab] can be approximated an FNNs with an optimized piecewise line activation function and fixed weights. We show our theoretical results using different activation functions and illustrate the error bound of FNNs approximation. All computations are done in Matalab 7.0.

Example 5.1

We choose a continuous function \(f(x)=x^3\) as the target function and research the FNNs with optimized piecewise linear functions and fixed weights (defined by (3.4)) approximation to f(x) on the bounded interval [ad].

By Eqs. (3.2) and (3.3), and letting \(c=1\), we have

$$\begin{aligned} \varphi _1\left( \frac{n(x-x_k)}{2(b-a)}\right) =\left\{ \begin{array}{ll} 0, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ 1-\frac{1}{2\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| , &{}\quad |x-x_k|<\frac{4(b-a)}{n}\mu _0, \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} \varphi _2\left( \frac{n(x-x_k)}{2(b-a)}\right) =\left\{ \begin{array}{ll} 0, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ 1-\frac{1}{\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| , &{}\quad \frac{2(b-a)}{n}\mu _0<|x-x_k|<\frac{4(b-a)}{n}\mu _0,\\ 1, &{}\quad |x-x_k|\le \frac{2(b-a)}{n}\mu _0. \end{array} \right. \end{aligned}$$

Then, the FNNs \(N_{n,1}\) and \(N_{n,2}\) are simply reduced to

$$\begin{aligned} N_{n,1}(f)=\left\{ \begin{array}{ll} 0, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ \frac{\sum _{k=0}^nf(x_k)\left( 1-\frac{1}{2\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| \right) }{\sum _{k=0}^n\left( 1-\frac{1}{2\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| \right) }, &{}\quad |x-x_k|<\frac{4(b-a)}{n}\mu _0, \end{array} \right. \end{aligned}$$
(5.1)

and

$$\begin{aligned} N_{n,2}(f)=\left\{ \begin{array}{ll} 0, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ \frac{\sum _{k=0}^nf(x_k)\left( 1-\frac{1}{\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| \right) }{\sum _{k=0}^n\left( 1-\frac{1}{\mu _0}\left| \frac{n}{2(b-a)}(x-a)-k\right| \right) }, &{}\quad \frac{2(b-a)}{n}\mu _0<|x-x_k|<\frac{4(b-a)}{n}\mu _0,\\ \left| \frac{\sum _{k=0}^nf(x_k)}{n+1}-f(x)\right| , &{}\quad |x-x_k|\le \frac{2(b-a)}{n}\mu _0. \end{array} \right. \end{aligned}$$
(5.2)

where \(x_k\), ab, and \(\mu _0\) are defined exactly the same as before.

From Eqs. (5.1) and (5.2), we know that the connection weights from the input layer to the hidden layer and from the hidden layer to the output layer, and the thresholds are, respectively, equal to \(\frac{n}{2(b-a)}\), \(f(x_k)\) and \(\frac{-\,nx_k}{2(b-a)}\) in the FNNs \(N_{n, 1}\) and \(N_{n, 2}\). Because \(a, b\in R, ~f(x_k), \mu _0\in R^+\) are all constants, \(s\in N^+\) is the positive integer, and \(n\in N^+\) is the number of hidden neurons, so we can find that the weights and the thresholds in the FNNs \(N_{n, 1}\) and \(N_{n, 2}\) are all fixed constants. Therefore, the error of approximation is

$$\begin{aligned} \big |N_{n, 1}(f, x)-f(x)\big |=\left\{ \begin{array}{ll} |f(x)|, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ \bigg |\frac{\sum _{k=0}^nf(x_k)\big (1-\frac{1}{2\mu _0}\big |\frac{n}{2(b-a)}(x-a)-k\big |\big )}{\sum _{k=0}^n\big (1-\frac{1}{2\mu _0}\big |\frac{n}{2(b-a)}(x-a)-k\big |\big )}-f(x)\bigg |, &{}\quad |x-x_k|<\frac{4(b-a)}{n}\mu _0, \end{array} \right. \end{aligned}$$
(5.3)

and

$$\begin{aligned}\big |N_{n, 2}(f, x)-f(x)\big | =\left\{ \begin{array}{ll} |f(x)|, &{}\quad |x-x_k|\ge \frac{4(b-a)}{n}\mu _0,\\ \bigg |\frac{\sum _{k=0}^nf(x_k)\big (1-\frac{1}{\mu _0}\big |\frac{n}{2(b-a)}(x-a)-k\big |\big )}{\sum _{k=0}^n\big (1-\frac{1}{\mu _0}\big |\frac{n}{2(b-a)}(x-a)-k\big |\big )}-f(x)\bigg |, &{}\quad \frac{2(b-a)}{n}\mu _0<|x-x_k|<\frac{4(b-a)}{n}\mu _0,\\ |1-f(x)|, &{}\quad |x-x_k|\le \frac{2(b-a)}{n}\mu _0. \end{array} \right. \end{aligned}$$
(5.4)

We let \(a=-1, d=1,\) and \(\mu _0=\frac{1}{4}\) in Eqs. (5.3) and (5.4). The following Fig. 5 shows a numerical result on \([-\,1,1]\) for \(n=5, 10, 20, 40, 80, 160\).

Fig. 5
figure 5

Target function \(f(x)=x^3\) and FNNs: \(N_{5,1}(f)\) and \(N_{10,1}(f)\) on the interval [\(-\,1\),1]

Example 5.2

We also choose a continuous function \(f(x)=x^3+x^2-5x+3\) as the target function and research the FNNs with optimized piecewise linear functions and fixed weights (defined by (3.4)) approximation to f(x) on the bounded interval [ad].

We also let \(a=-1, d=1,\) and \(\mu _0=\frac{1}{4}\) in Eqs. (5.3) and (5.4). The following Figs. 6 and 7 provides some numerical results on the neighborhoods of the interpolation points for \(n=5, 10\). (Note: Figure 7 is obtained from Fig. 6, which has been magnified to some extent. The interpolation points of \(N_{5,1}(f)\) and \(N_{10,1}(f)\) are respectively as − 1.0, − 0.6, − 0.2, 0.2, 0.6, 1.0 and − 1.0, − 0.8, − 0.6, − 0.4, − 0.2, 0, 0.2, 0.4, 0.6, 0.8, 1.0).

Fig. 6
figure 6

Target function \(f(x)=x^3+x^2-5x+3\) and FNNs: \(N_{5,1}(f)\) and \(N_{10,1}(f)\) on the interval [\(-\,1\),1]

Fig. 7
figure 7figure 7

Target function \(f(x)=x^3+x^2-5x+3\) and FNNs: \(N_{5,1}(f)\) and \(N_{10,1}(f)\) on the neighborhoods of the interpolation points. (Note: Fig. 7 is obtained from Fig. 6, which has been magnified to some extent. The interpolation points of \(N_{5,1}(f)\) and \(N_{10,1}(f)\) are respectively as \(-\,1\), \(-\,0.6\), \(-\,0.2\), 0.2, 0.6, 1 and \(-\,1\), \(-\,0.8\), \(-\,0.6\), \(-\,0.4\), \(-\,0.2\), 0, 0.2, 0.4, 0.6, 0.8 1). a The right neighborhood of the point \(-\,1\). b The neighborhood of the point \(-\,0.8\). c The neighborhood of the point \(-\,0.6\). d The neighborhood of the point \(-\,0.4\). e The neighborhood of the point \(-\,0.2\). f The neighborhood of the point 0. g The neighborhood of the point 0.2. h The neighborhood of the point 0.4. i The neighborhood of the point 0.6. j The neighborhood of the point 0.8. k The left neighborhood of the point 1

Example 5.3

We choose nonnegative density functions \(\varphi _s\) (defined by Eq. (3.5)) as an activation function. We can compute the theoretical error bound like the Example 5.1. We omit the graphs of the nonnegative density functions neural network \(N_{n, s}(f, x)\) since the corresponding graphs are almost the same as Figs. 6 and 7.

6 Conclusions and prospects

We have discussed the approximation of FNNs from the mathematical view in this paper. First, the optimized piecewise linear activation functions representations and structures of the three types of FNNs with fixed weights are constructed and discussed completely. Second, the ideal upper bound, lower bound and essential order of approximation precision of these FNNs for continuous function defined on bounded intervals are provided. Third, the interpolation results of these FNNs proved in this paper show that the representation errors made by these FNNs on the elements belonging to the training set are null. In other words, this implies that we can obtain the weights and the thresholds of an exact neural approximation without train. Finally, we also demonstrate some numerical results of examples that show the effectiveness of the method used in this paper. Our conclusions not only further characterize the intrinsic property of approximation of these FNNs, but also reveal the implicit relationship among the precision of approximation, the number of hidden units and the smoothness of the target function.

We wrap up this paper with the following prospects:

  1. (a)

    Although we give the essential approximation order of the constructed three kinds of FNNs, this is only for the Lipschitzian function class of approximated f. This means that the essential approximation order of the three FNNs for other function class of approximated f are worth to further study.

  2. (b)

    It is interesting and significant to extend the main theories in this paper to multivariate functions.

Clearly solving these two problems is not easy, but it is very important and valuable.