Abstract
The notion of incremental learning is to train an ANN algorithm in stages, as and when newer training data arrives. Incremental learning is becoming widespread in recent times with the advent of deep learning. Noise in the training data reduces the accuracy of the algorithm. In this paper, we make an empirical study of the effect of noise in the training phase. We numerically show that the accuracy of the algorithm is dependent more on the location of the error than the percentage of error. Using perceptron, feedforward neural network and radial basis function neural network, we show that for the same percentage of error, the accuracy of the algorithm significantly varies with the location of error. Furthermore, our results show that the dependence of the accuracy with the location of error is independent of the algorithm. However, the slope of the degradation curve decreases with more sophisticated algorithms.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Machine learning (ML) algorithms aim to train a computer to make decisions [1]. These algorithms are used in various fields such as image processing [2, 3], object recognition [4], handwriting recognition [5], natural language processing [6] and even quantum computing [7]. There are two major techniques for decision making used by ML algorithms—(i) supervised learning where the algorithm is initially trained with a set of labeled data [8], and (ii) unsupervised learning where the algorithm looks for patterns and similarities and tries to group similar data in the same cluster without any prior learning phase [9]. Although these two are the major types of learning algorithms, other forms of learning algorithms such as semi-supervised learning [10] and reinforcement learning [11] are also widely studied.
Artificial neural network (ANN) algorithms are supervised algorithms which mimic the working principles of neurons in the human brain. The most basic ANN algorithm is the single-layer perceptron. However, more sophisticated algorithms use multiple layers, complicated functions for decision making, and take the error of the output into consideration in order to update the weights. Although these algorithms assume a single initial training phase, in real world, data often arrives in batches, which requires the training to be performed in multiple stages. The model is first trained with the initial set of training data, and when more training data is available, the model is trained further. Such a model of training is called incremental learning [12].
In this paper, we study incremental learning in ANN where the training dataset may be noisy, i.e., some of the training data has incorrect label. It is obvious that if the training dataset is erroneous, the training will be less effective and the ANN will be less accurate in its prediction. In this paper, we show, however, that not only the number of erroneous data, but also the location of the error in incremental learning plays a vital role in the performance of the algorithm. The accuracy of an algorithm varies even when the percentage of error is the same, but the errors are concentrated in different locations of the training set.
In this paper, we show numerically by using three ANN algorithms (perceptron [13], feedforward neural network (FFN) [14, 15] and radial basis function neural network (RBF) [15]) that (i) for the same percentage of erroneous data, the location of error clusters can significantly alter the performance of the algorithm, (ii) the performance degradation is independent of the number of features per data and (iii) although more sophisticated algorithms are more robust to errors, the degradation in performance due to concentrated error has a similar nature for all algorithms.
The rest of the paper is organized as follows—in Sect. 2, we give a brief description of the three ANN algorithms used. In Sects. 3 and 4, we show the performance of the algorithms for two-step and three-step incremental learning. We conclude in Sect. 5.
2 Brief Review of the ANN Algorithms Used
An m-class classification problem [16], where C1, C2, …, Cm are the object classes, is associated with k training samples and n testing samples. Each training sample is a vector (si, Csi), where Csi is the designated class of the sample si. After training the algorithm with this set of training data, for each testing sample ti, such that ti belongs to class Cj, the algorithm is expected to produce
The algorithm is said to have made an error in the prediction if, for some testing sample tp ∈ Cp, it produces Prob(tp ∈ Cp) < Prob(tp ∈ Cq) for some \(q \ne p\). The objective of learning is to minimize the number of errors. In the following part of this subsection, we briefly discuss the working principles of the three ANN algorithms (perceptron, FNN and RBF) used in this paper. Our motivation behind using these three algorithms is to show that although sophistication of the algorithm enhances the robustness to training errors, the performance loss due to the location of error concentration remains invariant under the type of algorithm used.
Perceptron Learning Algorithm. Perceptron is one of the simplest ANN algorithms. In this algorithm, an input is a vector (x1, x2, …, xn), where each xi is called a feature and associated with each feature is a weight wi. For a 2-class classification problem, the output
where θ is a threshold value. The weights wi are initialized randomly and are modified during the training phase to match the class labels of each training sample. This algorithm can be easily modified for multi-class classification.
Feedforward Neural Network Algorithm. In FFN, the perceptrons are arranged in layers, with the first layer taking in inputs and the last layer producing outputs. The middle layers are called hidden layers. Each perceptron in one layer is connected to every perceptron on the next layer, but there is no interconnection among the perceptrons of the same layer. The information is constantly fed forward from one layer to the next. A single perceptron can classify points into two regions which are linearly separable, whereas by varying the number of layers, the number of input, output and hidden nodes, one can classify points in arbitrary dimension into an arbitrary number of groups.
Radial Basis Function Algorithm. A drawback of perceptron is that the activation function is linear and hence fails to classify nonlinearly separated data. The RBF algorithm uses radial basis functions as activation functions. RBF is a real-valued function \(\varphi\) whose value depends only on the distance from the origin \(\varphi \left( x \right) = \varphi \left( {\left\| x \right\|} \right)\). Its characteristic feature is that the response decreases or increases monotonically with distance from a central point. A typical radial function is the Gaussian function. RBF transforms the input signal into a different form, which can then be fed to the ANN to get linear separability. RBF has an input layer, a single hidden layer and an output layer. The sophistication of the activation function of this algorithm usually leads to better classification accuracy than perceptron or FFN.
3 Performance of ANN in Noisy Two-Step Incremental Learning
We have performed our study on the standard benchmark dataset IRIS [17], which contains 120 data samples, where each sample is a vector containing four features. Forty samples have been used to train each ANN algorithm, while the other 80 have been used to test their performance.
Error in training data implies that for a particular training sample si ∈ Ci, the class is incorrectly labeled as Cj, j ≠ i. For this dataset, containing 2n training samples (here n = 20), each ANN algorithm is trained twice sequentially with the first n and the last n training samples. We have varied the errors in the training set from 0 to 50% by a gap of 10. Three scenarios are considered, where the erroneous data is (i) uniformly distributed in the entire training set, (ii) uniformly distributed in the first half of the training set and (iii) uniformly distributed in the second half of the training set.
In Table 1, we show the accuracy obtained by perceptron, FFN and RBF algorithms, respectively, as the error in the training sample is varied as discussed above.
4 Performance of ANN in Noisy Three-Step Incremental Learning
We have performed this study on the WINE dataset [17], which contains 150 data samples, where each sample is a vector containing 13 features. The IRIS dataset chosen in the previous two-step experiment does not contain enough samples to effectively divide into three training sets. As such, we chose the WINE dataset for this experiment. For this dataset, the ANN algorithms have been trained using 60 samples and the other 90 have been used to test their performance. The motivation to use these two different datasets (IRIS and WINE) is to show that the performance degradation due to error clusters is independent of the number of features per sample. For the WINE dataset containing 3n training samples, each ANN algorithm is trained thrice sequentially with the first n, second n and the last n training samples. We have varied the errors in the training set from 0 to 40% by a gap of 10. Three scenarios are considered in each case, where the erroneous data is (i) uniformly distributed in the entire training set, (ii) uniformly distributed in the first 20 entries, (iii) uniformly distributed in the second 20 entries and (iv) uniformly distributed in the last 20 entries of the training set.
In Fig. 1, we show the graph of the accuracy of the three algorithms for incremental training. The first row shows the performance for two-step training, and the second row shows the same for three-step training. The graphs readily show that the degradation in performance is heavily dependent on the location of error. The performance of all the algorithms, when errors are distributed uniformly or are clustered in the first training set, is almost the same. However, as the errors move to later training sets, the performance of the algorithm increases significantly. Moreover, although the performance of RBF is better than FFN, which in its turn is better than perceptron, the nature of degradation remains similar for all the algorithms, irrespective of its sophistication.
For the three-step learning, we have also studied the accuracy of the said algorithms when the error is uniformly distributed in two of the three steps. The graph of the accuracy in this scenario is shown in Fig. 2. The performance of the algorithms due to error in two of the three halves also has a similar nature.
5 Conclusion
In this paper, we have numerically studied the accuracy of perceptron, FFN and RBF for two-step and three-step incremental learning in the presence of noisy dataset. We show that the accuracy of the algorithms depends not only on the percentage of error on the training set, but also on the location of the error concentration. In fact, for the same percentage of error, the location plays a significant role in the accuracy of the algorithms. Moreover, we also show that the nature of degradation due to concentrated error is invariant of the number of features in the data. The accuracy obtained from the most basic ANN (perceptron) and more sophisticated ANNs (FFN, RBF) shows that although sophistication makes the algorithm more robust to errors, the nature of the performance degradation due to location of error is similar for all the algorithms. Therefore, the concentration of error is a more acute threat to incremental learning with noisy training data than the percentage of error.
References
Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 16(4), 049901 (2007)
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: European Conference on Computer Vision, pp. 430–443. Springer (2006)
Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15. Springer (2000)
Duygulu, P., Barnard, K., Freitas, J., Forsyth, D.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: European Conference on Computer Vision, pp. 97–112. Springer (2002)
Lei, Xu., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Rebentrost, P., Mohseni, M., Llyod, S.: Quantum support vector machine for big data classification. Phys. Rev. Lett. 113(13), 130503 (2014)
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 161–168. ACM (2006)
Barlo, H.B.: Unsupervised learning. Neural Comput. 1(3), 295–311 (1989)
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2009)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Stephen, I.: Perceptron-based learning algorithms. IEEE Trans. Neural Netw. 50(2), 179 (1990)
Gurney, K.: An Introduction to Neural Networks. CRC Press (2014)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall PTR (1994)
Shalev-Shwartz, S., Srebro, N.: Understanding Machine Learning. Cambridge University Press (2014)
UCI repository of machine learning databases. https://archive.ics.uci.edu/ml/datasets.php
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ganguly, S., Chatterjee, A., Bhoumik, D., Majumdar, R. (2021). An Empirical Study of Incremental Learning in Neural Network with Noisy Training Set. In: Das, N.R., Sarkar, S. (eds) Computers and Devices for Communication. CODEC 2019. Lecture Notes in Networks and Systems, vol 147. Springer, Singapore. https://doi.org/10.1007/978-981-15-8366-7_11
Download citation
DOI: https://doi.org/10.1007/978-981-15-8366-7_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8365-0
Online ISBN: 978-981-15-8366-7
eBook Packages: EngineeringEngineering (R0)