Adaptive Ensemble Variants of Random Vector Functional Link Networks

Hu, Minghui; Shi, Qiushi; Suganthan, P. N.; Tanveer, M.

doi:10.1007/978-3-030-63823-8_4

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1333))

Included in the following conference series:

International Conference on Neural Information Processing

2360 Accesses
5 Citations

Abstract

In this paper, we propose a novel adaptive ensemble variant of random vector functional link (RVFL) networks. Adaptive ensemble RVFL networks assign different weights to the sub-classifiers according to prediction performance of single RVFL network. Generic Adaptive Ensemble RVFL is composed of a series of unrelated, independent weak classifiers. We also employ our adaptive ensemble method to the deep random vector functional link (dRVFL). Each layer in dRVFL can be regarded as a sub-classifier. However, instead of training several models independently, the sub-classifiers of dRVFL can be obtained by training a single network once.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Are Direct Links Necessary in Random Vector Functional Link Networks for Regression?

Deep negative correlation classification

Article 28 August 2024

Broad learning system based ensemble deep model

Article 01 April 2022

Keywords

1 Introduction

In recent years, deep learning methods have become attractive because of their success in multiple areas. Convolutional neural networks (CNN) won lots of competitions with conventional methods in visual tasks, and recurrent neural networks (RNN) based models were good at processing sequential inputs [10]. These methods use gradient-based back-propagation (BP) learning algorithm to train a large number of parameters in their networks. Proposed by Hinton in [13], the main idea of BP algorithm is to tune the weights and biases following the gradient of the loss function. However, due to the large computational cost, some modern-day neural networks may need several weeks to finish the training step [11]. Moreover, overfitting [6] is another serious problem that can cause the model to perform well during the training while achieving poor performance when testing.

Meanwhile, some randomization based neural networks have been proposed to overcome the flaws of the BP-based models [14, 15, 17]. The weights and biases in these models are randomly generated and kept fixed during the training process. Only the parameters in the output layers are obtained by the close-form solution [16]. Random vector functional link network (RVFL) is a typical single-hidden-layer randomized neural network [12]. It has a direct link to convey the information from the input layer directly to the output layer. This is useful because the output layer contains both the linear original features and the non-linear transformed features. The newest version of this model was proposed in [8] and called ensemble deep random vector functional link network (edRVFL), the authors convert the single-hidden-layer RVFL to the deep version and employ the idea of ensemble learning to reduce the computational complexity. Similar to the conventional neural networks, the edRVFL network also consists of an input layer, an output layer, and several hidden layers. The hidden weights and biases in this network are randomly generated and do not need to be trained. The uniqueness of this frame is that each layer is treated as an independent classifier, just like a single RVFL network. Eventually, the final output is obtained by fusing all the outputs.

Ensemble learning methods are widely used in classification problems, they combine multiple models for prediction to overcome the weakness of each single learning algorithm [19]. Among them, bagging [1], boosting [4], and stacking [18] are the three most popular and successful methods. Adaboost was originally proposed to improve the performance of the decision trees [3]. This method intends to combine several weak classifiers to obtain a strong classifier. The misclassified samples in the previous classifiers will be given greater importance in the following classifiers. Furthermore, the classifier with higher accuracy will also be assigned a higher weight in the final prediction. In this paper, we introduce two novel methods using Adaboost to generate the ensemble model of RVFL and the deep version of RVFL. We called them adaptive ensemble random vector functional link networks (ada-eRVFL) and adaptive ensemble deep random vector functional link networks (ada-edRVFL). In ada-eRVFL, we treat each single RVFL network as the weak classifier in Adaboost. However, in ada-edRVFL, we treat every single layer as the weak classifier.

2 Related Works

2.1 Random Vector Functional Link Networks

Random vector functional link network (RVFL) is a randomization based single hidden layer neural network proposed by Pao [12]. The basic structure of RVFL is shown in Fig. 1.

Both the linear original features and the non-linearly transformed features are conveyed to the output layer through the direct link and the hidden layer, respectively. Therefore, the output weights $\beta $ can be learned from the following optimization problem:

$$\begin{aligned} O_{RVFL}=\mathop {min}\limits _{\beta }||D\beta -Y||^2+\lambda ||\beta ||^2 \end{aligned}$$

(1)

where D is the combination of all linear and non-linear features, and Y are the true labels of all the samples. $\lambda $ denoted as the parameter for controlling how much the algorithm cares about the model complexity. This optimization problem can be solved by ridge regression [7], and the solution can be written as follows:

$$\begin{aligned} Primal Space: \beta =(D^TD+{\lambda }I)^{-1}D^TY \end{aligned}$$

(2)

$$\begin{aligned} Dual Space: \beta =D^T(DD^T+{\lambda }I)^{-1}Y \end{aligned}$$

(3)

The computational cost of training the RVFL network is reduced by suitably choosing between the primal or dual solution [15].

2.2 Ensemble Deep Random Vector Functional Link Networks

Inspired by other deep learning models, the authors of [8] proposed a deep version of the basic RVFL network. They also used ensemble learning to improve the performance of this model. The structure of edRVFL is shown in Fig. 2. Let n be the hidden neuron number and l be the hidden layer number. The output of the first hidden layer can be obtained by:

$$\begin{aligned} H^{(1)}=g(XW^{(1)}),\quad W^{(1)} \in \mathbb {R}^{d\times n} \end{aligned}$$

(4)

where X denotes the input features, d represents the feature number of the input samples, and $g(\cdot )$ is the non-linear activation function used in each hidden neuron. When the layer number $l>1$, similar to the RVFL networks, the hidden features in the previous hidden layer as well as the original features in the input layer are concatenated together to generate the next hidden layer. So Eq. 4 becomes, when $l>1$:

$$\begin{aligned} H^{(l)}=g([H^{(l-1)}X]W^{(l)}),\quad W^{(l)} \in \mathbb {R}^{(n+d)\times n} \end{aligned}$$

(5)

edRVFL network treats every hidden layer as a single RVFL classifier too. After getting predictions from all the layers via ridge regression, these outputs will be fused by ensemble methods to reach the final output.

2.3 AdaBoost

Boosting has been proven to be successful in solving classification problems. It was first introduced by [3], with their algorithm called Adaboost. This method was originally proposed for the two-class classification problem and improving the performance of the decision tree. The main idea of Adaboost is to approximate the Bayes classifier by combining several weak classifiers. Typically, the Adaboost algorithm is an iterative procedure, it starts with using unweighted samples to build the first classifier. During the following steps, the weights of the misclassified samples in the previous classifier will be boosted in the next classifier. That means these samples are given higher importance during the error calculation. After several repetitions, it employs weighted majority voting to combine outputs from every classifier to obtain the final output.

In [5], the authors developed a new algorithm called Stagewise Additive Modeling using a Multi-class Exponential loss function (SAMME) which directly extended the Adaboost to the multi-class case without complicating it into multiple two-class problems.

3 Method

In this section, we proposed the adaptive ensemble random vector functional link networks (ada-eRVFL). ada-eRVFL is inspired by the adaptive boosting method. For a data set $\textit{\textbf{x}}$, the weights $\alpha $ are assigned to the sub-classifiers according to the error function:

$$\begin{aligned} err^{(m)} = \sum _{i=1}^n \omega _i \mathbbm {l}(c_i \not = R^{(m)}(x_i)) / \sum _{i=1}^n \omega _i \end{aligned}$$

(6)

where R represents the single RVFL classifier and $\omega $ is the sample weight described as the follows:

$$\begin{aligned} \omega _i \leftarrow \omega _i\cdot exp\left( a^{(m)}\cdot \mathbbm {l}\left( c_i \not = R^{(m)}(x_i)\right) \right) \end{aligned}$$

(7)

It is worth noting that the sample weights only contribute in computing the error function of the sub-classifier. The training phase of the weak RVFL-classifiers only utilizes the original samples. The weak RVFL classifiers for ensemble are individual and independent. The mis-classified samples would be assigned a larger weight.

Another proposed variant is adaptive ensemble deep random vector functional link networks (ada-edRVFL). The deep RVFL network consists of a few hidden layers instead of single hidden layer in RVFL. Each hidden layer can constitute a sub-classifier with the input layer, output layer and the direct link between them.

For both proposed ideas, we need to make sure the classifier weights $\alpha $ are positive, thus it is required that $1-err^{(m)}> 1/K$.

4 Experiments

The experiments are performed on 9 classification datasets selected from UCI Machine Learning Repository [2], including both binary and multiple classification problems. Sample volume and feature dimensions are coverage from hundred to thousand. The details of the dataset are stated below. In the interests of a fair comparison, we use the exact same validation and test subsets and the same data pre-processing method as in [9].

For the RVFL based methods, the regularization parameter $\lambda $ is set as $\frac{1}{C}$ where C is chosen from range $2^x \{x = -6,-4,\dots ,10,12\}$. Based on the size of the dataset, the hidden neuron number these methods can be tuned from $\{2^2,2^3,\dots ,2^{10},2^{11}\}$.

For the dRVFL based methods, the regularization parameter $\lambda $ is set as $\frac{1}{C}$ where C is chosen from range $2^x \{x = -6,-4,\dots ,10,12\}$. Based on the size of the dataset, the hidden neuron number these methods can be tuned from $\{2^2,2^3,\dots ,2^{10},2^{11}\}$. Besides, the maximum number of hidden layers for the edRVFL based methods is set to 32, which is also the number of sub-classifiers in ada-edRVFL.

Table 1. Accuracy (%) and Average rank of variant approaches

Full size table

From Table 1, the ada-dRVFL achieves the best performance on all datasets. On average, the accuracy of ada-edRVFL has an improvement of over two percentages. It suggests the proposed adaptive ensemble method can effectively boost the performance of weak classifiers, and this method make single RVFL network competitive with deep RVFL network.

5 Conclusion

We proposed an adaptive ensemble method of random vector functional link networks. Compared to the edRVFL, our framework outperforms the edRVFL’s result. The proposed method can be employed to different RVFL based networks. After utilizing the proposed ensemble method, the performance of ensemble RVFL can compete with the deep RVFL networks. Specifically, we test the proposed method on 9 UCI classification task machine learning datasets. The experimental results show that the proposed ensemble variant is both effective and general.

References

Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Freund, Y., Schapire, R.E.: A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi, P. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-59119-2_166
Chapter Google Scholar
Freund, Y., Schapire, R.E., et al.: Experiments with a new boosting algorithm. In: ICML, vol. 96, pp. 148–156. Citeseer (1996)
Google Scholar
Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class adaboost. Stat. Interface 2(3), 349–360 (2009)
Article MathSciNet MATH Google Scholar
Hawkins, D.M.: The problem of overfitting. J. Chem. Inf. Comput. Sci. 44(1), 1–12 (2004)
Article MathSciNet Google Scholar
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article MATH Google Scholar
Katuwal, R., Suganthan, P., Tanveer, M.: Random vector functional link neural network based ensemble deep learning. arXiv preprint arXiv:1907.00350 (2019)
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 971–980 (2017)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Livni, R., Shalev-Shwartz, S., Shamir, O.: On the computational efficiency of training neural networks. In: Advances in Neural Information Processing Systems, pp. 855–863 (2014)
Google Scholar
Pao, Y.H., Takefuji, Y.: Functional-link net computing: theory, system architecture, and functionalities. Computer 25(5), 76–79 (1992)
Article Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science (1985)
Google Scholar
Schmidt, W.F., Kraaijveld, M.A., Duin, R.P., et al.: Feed forward neural networks with random weights. In: International Conference on Pattern Recognition, p. 1. IEEE Computer Society Press (1992)
Google Scholar
Suganthan, P.N.: On non-iterative learning algorithms with closed-form solution. Appl. Soft Comput. 70, 1078–1082 (2018)
Article Google Scholar
Te Braake, H.A., Van Straten, G.: Random activation weight neural net (RAWN) for fast non-iterative training. Eng. Appl. Artif. Intell. 8(1), 71–80 (1995)
Article Google Scholar
Widrow, B., Greenblatt, A., Kim, Y., Park, D.: The no-prop algorithm: a new learning algorithm for multilayer neural networks. Neural Netw. 37, 182–188 (2013)
Article Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Article Google Scholar
Zhang, C., Ma, Y.: Ensemble Machine Learning: Methods and Applications. Springer, Heidelberg (2012). https://doi.org/10.1007/978-1-4419-9326-7
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
Minghui Hu, Qiushi Shi & P. N. Suganthan
Indian Institute of Technology Indore, Simrol, Indore, 453552, India
M. Tanveer

Authors

Minghui Hu
View author publications
You can also search for this author in PubMed Google Scholar
Qiushi Shi
View author publications
You can also search for this author in PubMed Google Scholar
P. N. Suganthan
View author publications
You can also search for this author in PubMed Google Scholar
M. Tanveer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. N. Suganthan .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, M., Shi, Q., Suganthan, P.N., Tanveer, M. (2020). Adaptive Ensemble Variants of Random Vector Functional Link Networks. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1333. Springer, Cham. https://doi.org/10.1007/978-3-030-63823-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-63823-8_4
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63822-1
Online ISBN: 978-3-030-63823-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adaptive Ensemble Variants of Random Vector Functional Link Networks

Abstract

Similar content being viewed by others