Abstract
This paper focuses on noise resistant incremental learning algorithms for single layer feed-forward neural networks (SLFNNs). In a physical implementation of a well trained neural network, faults or noise are unavoidable. As biological neural networks have ability to tolerate noise, we would like to have a trained neural network that has certain ability to tolerate noise too. This paper first develops a noise tolerant objective function that can handle multiplicative weight noise. We assume that multiplicative weight noise exist in the weights between the input layer and the hidden layer, and in the weights between the hidden layer and the output layer. Based on the developed objective function, we propose two noise tolerant incremental extreme learning machine algorithms, namely weight deviation incremental extreme learning machine (WDT-IELM) and weight deviation convex incremental extreme learning machine (WDTC-IELM). Compared to the original extreme learning machine algorithms, the two proposed algorithms have much better ability to tolerate the multiplicative weight noise. Several simulations are carried out to demonstrate the superiority of the two proposed algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Extreme learning machine (ELM) algorithms (Huang et al. 2006; Huang and Chen 2007) provide a low computation solution for constructing a single layer feed-forward neural network (SLFNN). In the ELM concept, the input connection weights between the input and hidden layers are generated randomly. Hence we only need to train the output connection weights between the hidden and output layers. Although the ELM concept uses the random node concept, a SLFNN trained by the ELM concept still has the universal approximation ability (Barron 1993; Hornik et al. 1989; Hornik 1991). In the last several years, many applications of using ELM were reported. For example, a modified ELM model for imbalance data was reported (Li et al. 2018). Also, some works of using the ELM concept to handle biological data were reported (Bi et al. 2018; Wang et al. 2017).
There are two kinds of ELM algorithms. One is batch mode, in which we first generate a number of hidden nodes and then we estimate all the output weights at a time. Another one is incremental mode, in which we add the hidden nodes one-by-one into the network until the predefined stopping condition reaches. The incremental ELM (IELM) and the convex IELM (CIELM) (Huang et al. 2006; Huang and Chen 2007) are two representative incremental ELM algorithms with simple update rules. Although many ELM algorithms have been developed, few of them have the ability to tolerate network fault and noise.
In hardware implementation of neural networks, noise or faults are prone to occur. For instance when a neural network is implemented on the field-programmable gate array (FPGA) technology, we may use a low precision floating point format to represent connection weights. The roundoff error of using the floating point format can be modelled as multiplicative noise (Liu and Kaneko 1969). In an analog implementation, thermal noise and drifts always exist in operational amplifiers. Besides, the precision of analog devices, such as resistors, are in terms of percentage error. In addition, implementing a trained network using a nano-scale device, during the operation transient noise or failure may be introduced (Pajarinen et al. 2011; Mahdiani et al. 2012). Traditional learning algorithms have poor fault or noise tolerant performance. Since biological neural networks have certain ability to tolerate noise, we would like to train neural network that has certain noise tolerant too.
To handle noise and fault, it is essential to understand how they affect the behaviour or performance of a trained network. Noise or fault tolerant learning algorithms aim at training a network to attain acceptable performance even under noise and fault situations. A survey of various kinds of imperfect conditions in the traditional network model, such as radial basis function (RBF) networks, was reported in Martolia et al. (2015). Besides, failure tolerant ability of RBF networks was extensively studied (Leung et al. 2010; Feng et al. 2017; Murakami and Honda 2007). However, few results about failure tolerant ability of ELM networks were studied.
This paper investigates the noise tolerant performance of the SLFNN model trained by the incremental ELM concept. We consider that multiplicative weight noise exist between the input and hidden layers, and between the hidden and output layers. Firstly, a noise tolerant training objective function for SLFNNs is formulated. Afterwards, two incremental ELM algorithms, namely weight deviation incremental extreme learning machine (WDT-IELM) and weight deviation convex incremental extreme learning machine (WDTC-IELM), are derived.
In the WDT-IELM algorithm, the hidden nodes are added into the existing network incrementally in the one-by-one manner. After adding a new hidden node, all the previous trained output weights are not modified .
In the WDTC-IELM algorithm, we use a strategy similar to WDT-IELM to create a SLFNN, but we use a simple updating rule to modify the previous trained output weights.
We show that for the two proposed ELM algorithms, the training objective values are non-increasing at each training iteration. We use several simulations, operated on several commonly used datasets, to validate the superiority of the two proposed algorithms. Compared to the original incremental ELM algorithms, the two proposed algorithms have much better ability to tolerate the multiplicative weight noise. In addition, we perform paired-t tests to show that the improvement of using the proposed algorithms is statistical significant.
The rest of the paper is organized as follows. The backgrounds on ELM are given in Sect. 2. In Sect. 3, weight noise models are presented and the noise tolerant objective function is derived. Sect. 4 presents the two proposed algorithms. In addition, in this section, we show that during training, the objective values are non-increasing. The simulation results are presented in Sect. 5. The paper is ended with conclusion in Sect. 6.
2 Mathematical background on extreme learning machine
The standard ELM was developed to train SLFNNs (Huang et al. 2006; Huang and Chen 2007; Huang et al. 2006). In a SLFNN, there are three layers, namely, input, hidden and output layers. In the ELM concept, the input weights in between the input layer and the hidden layer are randomly generated. They do not need to be learned or tuned. Only the output weights in between the hidden layer and the output layer nodes are required to be trained. Hence, during learning, the computational cost is not prohibitive and is much lower than that of other traditional learning algorithms such as gradient descent (Guély and Siarry 1993).
This paper considers to use the ELM concept to solve the nonlinear regression problem. Let \({\mathbb {D}}_{train}=\{({\varvec{x_k}},t_k): \, {\varvec{x}}_k \, \in \, {\mathbb {R}}^D, t_k \, \in R, \quad k=1,\ldots ,N \}\) be the training set, where D is the number of input features, N is the number of training samples, and \({\varvec{x_k}}\) and \(t_k\) are the inputs and target output of the k-th sample, respectively. Similarly, the test set is denoted as \({\mathbb {D}}_{test}=\{{\varvec{x}}'_{k'},t'_{k'}):\, {\varvec{x}}'_{k'} \, \in \, {\mathbb {R}}^{D'}, t'_{k'} \in R, \quad k'=1,\ldots ,N' \}\), where \(N'\) is the number of samples in the test set.
The output of a SLFNN with m hidden nodes is equal to
where \(\beta _j\) is the jth output weight, and \(h_j({\varvec{x}})\) is the output of the jth hidden node. There are several possible activation functions, for instance, in the case of the sigmoid activation function, \(h_j(\cdot )\) is given by
where \({\varvec{a}}_j\) and \(b_j\) are the input weights and input bias, respectively, of the jth hidden node. Grouping \({\varvec{a}}_j\) and \(b_j\) together, (3) can be rewritten as
where \({\varvec{w}}_j=[{\varvec{a}}_j^\text {T},b_j]^\text {T}\), and \({\varvec{o}}=[{\varvec{x}}_k^\text {T},1]^\text {T}\). In the ELM concept, the input weight vectors, \({\varvec{w}}_j\)’s, are generated randomly.
Given all training samples, the training set error is equal to
Let \(\varvec{t}\) be the collection of the target outputs, given by
and let \(\varvec{h}_j\) be the collection of the j hidden node outputs of all input samples, given by
Equation (4) can be written in a compact form, given by
where \(\varvec{\beta }_m=[\beta _1,\ldots ,\beta _m]^\text {T}\), and
As aforementioned, the input weights \(\varvec{w}_j\)’s are randomly generated. During training, we do not need to adjust them. The ELM concept uses the least square approach to obtain the output weights. In the batch mode ELM, given m hidden nodes, the optimal output weight vector that minimizes the training set error is given by
Instead of using the batch mode to find out the weights, the ELM concept has the incremental mode, in which we incrementally add hidden nodes into the existing network until the stopping condition reaches. When we insert a new hidden node, we need to determine its output weight only. The IELM and CIELM are two incremental ELM algorithms. In the IELM algorithm, after inserting a new hidden node, all the previous trained output weights are unchanged. The difference between the two algorithms is that the CIELM algorithm uses a simple updating rule to modify all the existing output weights. Although the ELM concept can simplify the creation process of a SLFNN, few ELM algorithms have ability to tolerate the noise situation. In the rest of the paper, we will first define a noise tolerant objective function, and then develop the noise tolerant versions of IELM and CIELM.
3 Weight noise model and objective function
In this section, we consider that a SLFNN is affected by multiplicative weight noise in the input weight vectors \(\varvec{w}_j\)’s and the output weights \(\beta _j\)’s. We will first describe the noise model and then develop a noise tolerant objective function.
3.1 Weight noise model
When we implement a trained network, weight noise is prone to occur. Weight noise can be regarded as the deviation from the nominal value of the weight of a well trained neural network. For instance, after training a neural network, we may implement the trained network on hardware such as FPGA. To do this, we may use a low precision floating point format to represent connection weights. The roundoff error of using the floating point format can cause an implemented weight to deviate from its nominal value. Also, in the analog implementation, noise are unavoidable. One of commonly used noise is multiplicative weight noise (Burr 1991; Liu and Kaneko 1969). In this model, the difference between the implemented weight value and its nominate value is proportional to its nominate value.
Let \({w}_{jl}\) be the original value of the lth element in the jth input weight vector. Under multiplicative noise model, the deviation of an input weight \(w_{jl}\) from its nominal value is given by
where \(\upsilon _{jl}\)’s are independent and identically distributed (iid) random variables (RVs). Their mean is equal to zero and variance is equal to \(\sigma ^2_w\). In other words, the implemented value of an input weight is given by
In (9), the magnitude of the noise component \(\delta _{jl}\) is proportional to that of the nominate value \(w_{jl}\).
Given the deviations \(\delta _{jl}\), for all \(l=1,\ldots ,D+1\), of the input weights for the jth hidden node, the hidden node output is
Note that \(\varvec{o}=[\varvec{x}^\text {T},1]^\text {T}\) and \({\tilde{\varvec{w}}}=[{\tilde{w}}_{j1},\ldots ,{\tilde{w}}_{j(D+1)}]^\text {T}\), where D is the number of input features of the neural network. We can use the first order Taylor series to expand (10), given by
where
When the sigmoid function is used as the activation function,
Similarly, under the multiplicative noise, the value of an output weight becomes
where \(\zeta _{j}\)’s are iid RVs. Their mean is equal to zero and variance is equal to \(\sigma ^2_\beta\).
Hence for a given input pattern \(\varvec{x}\), the weighted output of a noisy hidden node is given by
Since \(\zeta _{j}\)’s and \(\upsilon _{jl}\)’s are iid RVs and have zero mean, the expected values of the weighted outputs are given by
where \(\left\langle \cdot \right\rangle\) is the expectation operator. Also, \(\zeta _{j}\)’s and \(\upsilon _{jl}\)’s are with variances equal to \(\sigma _w^2\) and \(\sigma _\beta ^2\), respectively. Hence the expected squares of the weighted outputs are given by
Furthermore, for \(j\ne j'\), we have
3.2 Noise tolerant objective function
Traditional ELM algorithms, for regression, minimize the square error between the network output and target output. In the noise situation, we propose to minimize the expected error over all noise patterns. For a particular noise pattern, the training error set of a noisy network is given by
From (16)–(18), the expected error over all noise patterns is given by
where
In (21), the expected error of a SLFFN contains three terms. The first term \(\Vert \varvec{e}_m\Vert ^2_2\) is the error of a noiseless SLFFN. The second term is the degradation from the noise in the output weights \(\beta _j\)’s. The third term is the the degradation from the noise in the input weights \(w_{jl}\)’s. In the ELM concept, the input weights \(w_{jl}\) are randomly generated, and their values are then fixed. Only the output weights are required to be trained.
4 Noise tolerant incremental algorithms
This section presents the two incremental ELM algorithms, namely WDT-IELM and WDTC-IELM, for training SLFNNs. The WDTC-IELM algorithm use the same strategy to add hidden nodes into the network, but we use a simple updating rule to modify the existing output weights.
4.1 WDT-IELM
The WDT-IELM algorithm is a noise tolerant version of the original IELM. The WDT-IELM algorithm incrementally adds hidden nodes one-by-one into the network. Suppose that a SLFFN already has \(m-1\) hidden nodes. The incremental strategy is that we determine the value of the output weight \(\beta _m\) of the newly inserted node and do not modify the existing output weights \(\{\beta _1,\ldots ,\beta _{m-1}\}\). According to that strategy, we have the following recursive relationships for \(\varvec{f}_m\), \(\kappa _m\), and \(\tau _m\), given by
With (27), the recursive equation for the error vector is given by
Based on (21), and (27)–(30), \(J_m\) can be expressed as
Let
To maximize the reduction in the expected error over all noise patterns, we should consider
Then the optimal value of \(\beta _m\) is given by
where
With this optimal value, \(R_m\) is given by
Apparently, the expected training error of the noisy network always reduces. The summary of the WDT-IELM algorithm is given in Algorithm 1. It should be noticed during training, we do not need to keep \(\tau _m\) and \(\kappa _m\).
4.2 WDTC-IELM
In Huang and Chen (2007), the CIELM algorithm was presented. It aims at improving the approximation ability of IELM. The performance demonstration in Huang and Chen (2007) showed that the CIELM algorithm outperforms the original IELM algorithm. However, it was designed for the noiseless situation. Hence, it is equally important to develop a noise tolerant version of CIELM.
In the proposed WDTC-IELM algorithm, we also incrementally insert hidden nodes into the network in the one-by-one manner. After determine the current output weight \(\beta _m\), we update all the previous trained weights, given by
With (37), the recursive equations for \(\varvec{f}_m\), \(\varvec{e}_m\), \(\kappa _m\), and \(\tau _m\) becomes
From (38) and (41), the training objective \(J_m\) can be expressed as
Then we can obtain the difference between two consecutive iterations, given by
Similar to the WDT-IELM algorithm, in the WDTC-IELM the optimal value of \(\beta _m\) is given by
where
With this optimal value, \(R_m\) is given by
where
In (48), the denominator \(\varOmega\) is positive. Hence, the expected training error of the noisy network always reduces. The summary of the WDTC-IELM algorithm is given in Algorithm 2. It can be seen that in the WDTC-IELM, we need to keep two addition variables \(\tau _m\) and \(\kappa _m\).
5 Numerical study
In this section, we evaluate the performance of the proposed algorithms with some real life datasets obtained from the UCI machine learning repository (Lichman 2013). Datasets that we use to validate the performance of the algorithms are Abalone, Housing price, Concrete Compressive strength, Airfoil Self Noise (ASN), BodyFat, Chemical Sensor, and Building Energy. Table 1 presents the properties of the datasets. The datasets are pre-processed. The target outputs of these datasets are normalized to the range of [0, 1], while the input features of the data sets are normalized to the range of [−1, 1]. In addition, we randomly generate the input weights of the hidden nodes from range [−1, 1].
For fair comparison, we use the tenfold evaluation method. The samples of a dataset are randomly partitioned into ten subsets. The summary of the partitioning is given in Table 2. In our simulation, a subset is used as the test set, and the remaining nine subsets are used as training data. The noise levels that we test are (\(\{\sigma ^2_\beta =\sigma ^2_w=0.04, \sigma ^2_\beta =\sigma ^2_w=0.09, \sigma ^2_\beta =\sigma ^2_w=0.09\) and \(\sigma ^2_\beta =\sigma ^2_w=0.25 \}\)). According to the analysis in Liu and Kaneko (1969), the noise level \(\sigma ^2_\beta =\sigma ^2_w=0.04\) responds around 2–3 mantissa bits in the digital implementation. For other noise levels, we can consider that the standard deviation of noise is around 30–50 % of the nominate value in the analog implementation.
5.1 Number of hidden nodes
We use three datasets to demonstrate how the test set errors change with respect to the various numbers of hidden nodes. The three datasets are Abalone, Concrete, and Boston Housing. Three noise levels \(\{\sigma ^2_\beta =\sigma ^2_w=0.04, \sigma ^2_\beta =\sigma ^2_w=0.09,\sigma ^2_\beta =\sigma ^2_w=0.25 \}\) are considered. Figure 1 shows the test set MSE versus the number of nodes for a typical run. It can be seen that the test set errors of the CIELM are much higher than those of the other three algorithms. That means, the noise tolerant ability of the original CIELM is very poor. For the IELM, WDT-IELM, and WDTC-IELM algorithms, when the number of hidden nodes is around 400–500, the decreasing rate of the test set error is very slow. Thus, we treat 500 hidden nodes as a reference point to conduct a deeper analysis in the rest parts of the paper.
For the figure, the WDT-IELM algorithm is better than the IELM algorithm. When the noise level is high, the improvement on the test set error becomes more significant. In addition, when we use the WDTC-IELM algorithm, we can further improve the test set MSE. For instance, for the Abalone data set with noise level equal to \(\sigma ^2_\beta =\sigma ^2_w=0.04\), the test set MSE of the original I-ELM algorithm is 0.01074. When we use the WDT-IELM algorithm, we can reduce the test set MSE to 0.01053. Using the WDTC-IELM algorithm, we can further reduce the test set MSE to 0.006246. The MSE difference between IELM and WDTC-IELM is 0.005172.
5.2 Performance comparison
To further investigate the performance of those algorithms, we use the tenfold evaluation strategy. The setting of the tenfold is shown in Table 2. The average test set performance over the tenfold in the seven datasets is summarized in Table 3. Besides, for an easy and quick view of the result in Table 3, we also provide a chart view of the performance in Fig. 2. The table contains \(7 \times 4 \times 4=112\) entities. Each entity is the average MSE of the tenfolds (ten runs).
From the table, the test set MSE values from WDT-IELM are smaller than those of IELM. This is obvious at large noise level as shown in Fig. 2. Besides, the test set MSE values from WDTC-IELM are much smaller than those of the other three algorithms. For instance, for the BodyFat dataset, when the noise level is 0.04, the test set MSE value of the original IELM is 0.020329. When we use WDT-IELM, we reduce the test set MSE value to 0.019707. Furthermore, when we use the WDTC-IELM algorithm, the test set MSE value can be reduced to 0.010461. The improvement of using the WDTC-IELM algorithm is more significant for high noise levels. When the noise level is 0.25, the test set MSE value of the original IELM is 0.063316. When we use WDT-IELM, we further reduce the test set MSE value to 0.046406. Furthermore, when we use the WDTC-IELM algorithm, the test set MSE value can be reduced to 0.012829.
Furthermore, we discover that the WDTC-IELM are relatively insensitive to the noise level. Consider the Abalone dataset.
-
When the noise level is 0.04, the test set MSE of IELM is 0.011783. When the noise level increases to 0.25, the test set MSE increases to 0.031176.
-
When the noise level is 0.04, the test set MSE of CIELM is 0.020905. When the noise level increases to 0.25, the test set MSE increases to 0.089067.
-
When the noise level is 0.04, the test set MSE of WDT-IELM is 0.011560. When the noise level increases to 0.25, the test set MSE increases to 0.023635.
-
When the noise level is 0.04, the test set MSE of WDT-IELM is 0.007181. When the noise level increases to 0.25, the test set MSE increases to 0.008544.
The above phenomenon also happens in the other six datasets. Since the WDTC-IELM is the best among the four algorithms, one may argue that we do not need to consider the WDT-IELM. However, the WDTC-IELM algorithm needs to update all the previous trained weights and has a more complicated training procedure, as shown Algorithms 1 and 2.
5.3 Paired T-test analysis
From Fig 2 and Table 3, in terms of average test set MSE, the two proposed algorithms are better than the two original algorithms. In this section, we would like to check if the improvements are statistical significant or not. We would like to perform significant test, i.e., paired t-test, to check if the performance of the proposed algorithms are statistical significant or not. Since the CIELM is with very poor performance, we do not perform the paired t-test on it.
Tables 4 and 5 summarize the paired t-test results, i.e., the IELM versus WDT-IELM, and IELM versus WDTC-IELM. For the one-tail test with 10 trials and 95% confidence level, the critical t-value is 1.8331.
Before we perform the paired test, we should check if the data pass the normality test or not. In this paper, we consider the Anderson–Darling goodness-of-fit hypothesis test. For the hypothesis test, the critical p value is 0.05. That is the p value of the data should be greater than 0.05. Since there are three algorithms involved, four noise levels, and seven datasets, there are 84 sets of data. Among those sets, most of cases meet the normality test. Only nine cases do not the normality test. The nine cases appear in three datasets: the ASN dataset, the Chemical Sensor dataset, and the Housing dataset. In the ASN dataset, there are three cases: WDT-IELM (noise level=0.04) with p value equal to 0.0498, WDT-IELM (noise level=0.09) with p value equal to 0.0162, and IELM (noise level=0.09) with p value equal to 0.0313. In the Housing dataset, there are two cases: IELM (noise level=0.16) with p value equal to 0.0481 and IELM (noise level = 0.25) with p value equal to 0.0381. In the Chemical Sensor dataset, there are four cases: IELM (noise level=0.16) with p value equal to 0.028, IELM (noise level=0.25) with p value equal to 0.0129, WDT-IELM (noise level=0.16) with p value equal to 0.03, and WDT-IELM (noise level=0.25) with p value equal to 0.0345.
In Table 4, all t-values are greater than the critical t-value (i.e. 1.8331). Besides, all confidence intervals of the average improvements excluded the zero. For example, in the Bodyfat dataset with the noise level equal to 0.25, the t-value is 89.1, which is greater than 1.8331. Besides, the confidence interval of average improvement is in between [0.016480, 0.017339]. With these results, there is strong evidence that the WDT-IELM is better than IELM.
Again, in Table 5, all the t-values are much greater than the critical t-value. This phenomenon revealed the improvement of WDTC-IELM is statistically significant too.
As mentioned in the above, there are a few cases that do not meet the normality test. However, from Table 3, for those cases, the improvements of using WDT-IELM and WDTC-ILM are greater than the standard deviations of IELM, WDT-IELM and WDTC-IELM. Hence, we can conclude that the improvements of WDT-IELM and WDTC-IELM are significant.
For example, for the ASN with the noise level equal to 0.09, for the IELM, the test set MSE of is 0.049538 and the standard deviation is 0.002893. When we use the WDT-IELM, the test set MSE of is 0.044534 and the standard deviation is 0.002797. Clearly, the improvement of using WDT-IELM is around 0.005004 and is around two times of the standard deviations. When we use the WDTC-IELM, the test set MSE of is 0.018152 and the standard deviation is 0.002. Clearly, the improvement of using WDT-IELM is around 0.031386 and is around nines times of the standard deviations.
5.4 Performance improvement ratio
We also present the performance improvement ratios of WDT-IELM and WDTC-IELM. We calculate the performance improvement ratio:
where \(P_e\) and \(P_p\) are the performances of existing and proposed models, respectively. Table 6 summarizes the the performance improvement ratios of WDT-IELM and WDTC-IELM. The 3rd and 4th columns show the the performance improvement ratios of WDT-IELM and WDTC-IELM to IELM, respectively. The 3rd and 4th columns show the the performance improvement ratios of WDT-IELM and WDTC-IELM to C-IELM. The table clearly shows that our algorithms outperform the existing ones. For example, in the ASN dataset we have the following performance improvement ratios.
-
When the noise level is 0.04, the improvement ratio of WDT-IELM to IELM is 3.06%. When the noise level increases to 0.25, the improvement ratio increases to 29.14%.
-
When the noise level is 0.04, the improvement ratio of WDTC-IELM to IELM is 42.86%. When the noise level increases to 0.25, the improvement ratio increases to 77.54%.
-
When the noise level is 0.04, the improvement ratio of WDT-IELM to CIELM is 56.37%. When the noise level increases to 0.25, the improvement ratio increases to 82.06%.
-
When the noise level is 0.04, the improvement ratio of WDTC-IELM to CIELM is 74.28%. When the noise level increases to 0.25, the improvement ratio increases to 94.32%.
The above occurrence also happens in the other six datasets. Furthermore, we give visual figures as shown in Fig. 3. The figure makes a better view of Table 6. From the figure and table, the improvement ratios of our proposed algorithms (WDT-IELM and WDTC-IELM) are more significant for high noise levels.
6 Conclusion
In this paper, we have developed a noise resistant objective function for concurrent multiplicative weight noise in the input weights and output weights. Based on the developed objective function, we then propose two increment ELM algorithms that can handle weight noise. We also show that during the incremental training, the training objective is non-increasing. From the simulation results, the proposed incremental algorithms are much better than the original incremental algorithms. Besides, based on the paired t-test results and the performance improvement ratio results, the improvement of the proposed algorithms are statistical significant.
References
Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theory 39(3):930–945
Bi X, Ma H, Li J, Ma Y, Chen D (2018) A positive and unlabeled learning framework based on extreme learning machine for drug-drug interactions discovery. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-0960-7
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 2019
Burr JB (1991) Digital neural network implementations. Neural Netw Concept Appl Implement 3:237–285
Feng RB, Han ZF, Wan WY, Leung CS (2017) Properties and learning algorithms for faulty RBF networks with coexistence of weight and node failures. Neurocomputing 224:166–176
Guély F, Siarry P (1993) Gradient descent method for optimizing various fuzzy rule bases. In: Second IEEE international conference on fuzzy systems, 1993, pp 1241–1246
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Hornik K (1991) Approximation capabilities of multilayer feedforward networks. Neural Netw 4(2):251–257
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Huang GB, Chen L, Siew CK (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw 17(4):879–892
Huang GB, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16–18):3056–3062
Leung CS, Wang HJ, Sum J (2010) On the selection of weight decay parameter for faulty networks. IEEE Trans Neural Netw 21(8):1232–1244
Li Y, Zhang S, Yin Y, Xiao W, Zhang J (2018) Parallel one-class extreme learning machine for imbalance learning based on Bayesian approach. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-018-0994-x
Liu B, Kaneko T (1969) Error analysis of digital filters realized with floating-point arithmetic. Proc IEEE 57(10):1735–1747
Mahdiani HR, Fakhraie SM, Lucas C (2012) Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors. IEEE Trans Neural Netw Learn Systems 23(8):1215–1228
Martolia R, Jain A, Singla L (2015) Analysis & survey on fault tolerance in radial basis function networks. In: 2015 IEEE international conference on computing, communication & automation (ICCCA), pp 469–473
Murakami M, Honda N (2007) Fault tolerance comparison of IDS models with multilayer perceptron and radial basis function networks. In: International joint conference on neural networks 2007 (IJCNN2007), pp 1079–1084, IEEE
Pajarinen J, Peltonen J, Uusitalo MA (2011) Fault tolerant machine learning for nanoscale cognitive radio. Neurocomputing 74(5):753–764
Wang SJ, Muhammad K, Phillips P, Dong Z, Zhang YD (2017) Ductal carcinoma in situ detection in breast thermography by extreme learning machine and combination of statistical measure and fractal dimension. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-017-0639-5
Acknowledgements
The work is supported by a research grant from City University of Hong Kong (No. 9610431).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Adegoke, M., Wong, H.T., Leung, A.C.S. et al. Two noise tolerant incremental learning algorithms for single layer feed-forward neural networks. J Ambient Intell Human Comput 14, 15643–15657 (2023). https://doi.org/10.1007/s12652-019-01488-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-019-01488-8