An Empirical Study of Incremental Learning in Neural Network with Noisy Training Set

Ganguly, Shovik; Chatterjee, Atrayee; Bhoumik, Debasmita; Majumdar, Ritajit

doi:10.1007/978-981-15-8366-7_11

Shovik Ganguly¹¹,
Atrayee Chatterjee¹²,
Debasmita Bhoumik¹³ &
…
Ritajit Majumdar¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 147))

Included in the following conference series:

International Conference on Computers and Devices for Communication

656 Accesses

Abstract

The notion of incremental learning is to train an ANN algorithm in stages, as and when newer training data arrives. Incremental learning is becoming widespread in recent times with the advent of deep learning. Noise in the training data reduces the accuracy of the algorithm. In this paper, we make an empirical study of the effect of noise in the training phase. We numerically show that the accuracy of the algorithm is dependent more on the location of the error than the percentage of error. Using perceptron, feedforward neural network and radial basis function neural network, we show that for the same percentage of error, the accuracy of the algorithm significantly varies with the location of error. Furthermore, our results show that the dependence of the accuracy with the location of error is independent of the algorithm. However, the slope of the degradation curve decreases with more sophisticated algorithms.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Exact Incremental Learning for a Single Non-linear Neuron Based on Taylor Expansion and Greville Formula

A review of adaptive online learning for artificial neural networks

Article 22 October 2016

An Incremental Learning Network Model Based on Random Sample Distribution Fitting

Keywords

1 Introduction

Machine learning (ML) algorithms aim to train a computer to make decisions [1]. These algorithms are used in various fields such as image processing [2, 3], object recognition [4], handwriting recognition [5], natural language processing [6] and even quantum computing [7]. There are two major techniques for decision making used by ML algorithms—(i) supervised learning where the algorithm is initially trained with a set of labeled data [8], and (ii) unsupervised learning where the algorithm looks for patterns and similarities and tries to group similar data in the same cluster without any prior learning phase [9]. Although these two are the major types of learning algorithms, other forms of learning algorithms such as semi-supervised learning [10] and reinforcement learning [11] are also widely studied.

Artificial neural network (ANN) algorithms are supervised algorithms which mimic the working principles of neurons in the human brain. The most basic ANN algorithm is the single-layer perceptron. However, more sophisticated algorithms use multiple layers, complicated functions for decision making, and take the error of the output into consideration in order to update the weights. Although these algorithms assume a single initial training phase, in real world, data often arrives in batches, which requires the training to be performed in multiple stages. The model is first trained with the initial set of training data, and when more training data is available, the model is trained further. Such a model of training is called incremental learning [12].

In this paper, we study incremental learning in ANN where the training dataset may be noisy, i.e., some of the training data has incorrect label. It is obvious that if the training dataset is erroneous, the training will be less effective and the ANN will be less accurate in its prediction. In this paper, we show, however, that not only the number of erroneous data, but also the location of the error in incremental learning plays a vital role in the performance of the algorithm. The accuracy of an algorithm varies even when the percentage of error is the same, but the errors are concentrated in different locations of the training set.

In this paper, we show numerically by using three ANN algorithms (perceptron [13], feedforward neural network (FFN) [14, 15] and radial basis function neural network (RBF) [15]) that (i) for the same percentage of erroneous data, the location of error clusters can significantly alter the performance of the algorithm, (ii) the performance degradation is independent of the number of features per data and (iii) although more sophisticated algorithms are more robust to errors, the degradation in performance due to concentrated error has a similar nature for all algorithms.

The rest of the paper is organized as follows—in Sect. 2, we give a brief description of the three ANN algorithms used. In Sects. 3 and 4, we show the performance of the algorithms for two-step and three-step incremental learning. We conclude in Sect. 5.

2 Brief Review of the ANN Algorithms Used

An m-class classification problem [16], where C₁, C₂, …, C_m are the object classes, is associated with k training samples and n testing samples. Each training sample is a vector (s_i, C_si), where C_si is the designated class of the sample s_i. After training the algorithm with this set of training data, for each testing sample t_i, such that t_i belongs to class C_j, the algorithm is expected to produce

$${\text{Prob}}\left( {t_{i} \in C_{j} } \right) > {\text{Prob}}\left( {t_{i} \in C_{l} } \right)\quad \forall l \ne j.$$

(1)

The algorithm is said to have made an error in the prediction if, for some testing sample t_p ∈ C_p, it produces Prob(t_p ∈ C_p) < Prob(t_p ∈ C_q) for some $q \ne p$. The objective of learning is to minimize the number of errors. In the following part of this subsection, we briefly discuss the working principles of the three ANN algorithms (perceptron, FNN and RBF) used in this paper. Our motivation behind using these three algorithms is to show that although sophistication of the algorithm enhances the robustness to training errors, the performance loss due to the location of error concentration remains invariant under the type of algorithm used.

Perceptron Learning Algorithm. Perceptron is one of the simplest ANN algorithms. In this algorithm, an input is a vector (x₁, x₂, …, x_n), where each x_i is called a feature and associated with each feature is a weight w_i. For a 2-class classification problem, the output

$$y = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if}}\,\sum\limits_{i = 1}^{n} {w_{i} x_{i} \ge \theta } } \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$$

(2)

where θ is a threshold value. The weights w_i are initialized randomly and are modified during the training phase to match the class labels of each training sample. This algorithm can be easily modified for multi-class classification.

Feedforward Neural Network Algorithm. In FFN, the perceptrons are arranged in layers, with the first layer taking in inputs and the last layer producing outputs. The middle layers are called hidden layers. Each perceptron in one layer is connected to every perceptron on the next layer, but there is no interconnection among the perceptrons of the same layer. The information is constantly fed forward from one layer to the next. A single perceptron can classify points into two regions which are linearly separable, whereas by varying the number of layers, the number of input, output and hidden nodes, one can classify points in arbitrary dimension into an arbitrary number of groups.

Radial Basis Function Algorithm. A drawback of perceptron is that the activation function is linear and hence fails to classify nonlinearly separated data. The RBF algorithm uses radial basis functions as activation functions. RBF is a real-valued function $\varphi$ whose value depends only on the distance from the origin $\varphi \left( x \right) = \varphi \left( {\left\| x \right\|} \right)$. Its characteristic feature is that the response decreases or increases monotonically with distance from a central point. A typical radial function is the Gaussian function. RBF transforms the input signal into a different form, which can then be fed to the ANN to get linear separability. RBF has an input layer, a single hidden layer and an output layer. The sophistication of the activation function of this algorithm usually leads to better classification accuracy than perceptron or FFN.

3 Performance of ANN in Noisy Two-Step Incremental Learning

We have performed our study on the standard benchmark dataset IRIS [17], which contains 120 data samples, where each sample is a vector containing four features. Forty samples have been used to train each ANN algorithm, while the other 80 have been used to test their performance.

Error in training data implies that for a particular training sample s_i ∈ C_i, the class is incorrectly labeled as C_j, j ≠ i. For this dataset, containing 2n training samples (here n = 20), each ANN algorithm is trained twice sequentially with the first n and the last n training samples. We have varied the errors in the training set from 0 to 50% by a gap of 10. Three scenarios are considered, where the erroneous data is (i) uniformly distributed in the entire training set, (ii) uniformly distributed in the first half of the training set and (iii) uniformly distributed in the second half of the training set.

In Table 1, we show the accuracy obtained by perceptron, FFN and RBF algorithms, respectively, as the error in the training sample is varied as discussed above.

Table 1 Accuracy of perceptron, FFN and RBF in the presence of clustered noise

Full size table

4 Performance of ANN in Noisy Three-Step Incremental Learning

We have performed this study on the WINE dataset [17], which contains 150 data samples, where each sample is a vector containing 13 features. The IRIS dataset chosen in the previous two-step experiment does not contain enough samples to effectively divide into three training sets. As such, we chose the WINE dataset for this experiment. For this dataset, the ANN algorithms have been trained using 60 samples and the other 90 have been used to test their performance. The motivation to use these two different datasets (IRIS and WINE) is to show that the performance degradation due to error clusters is independent of the number of features per sample. For the WINE dataset containing 3n training samples, each ANN algorithm is trained thrice sequentially with the first n, second n and the last n training samples. We have varied the errors in the training set from 0 to 40% by a gap of 10. Three scenarios are considered in each case, where the erroneous data is (i) uniformly distributed in the entire training set, (ii) uniformly distributed in the first 20 entries, (iii) uniformly distributed in the second 20 entries and (iv) uniformly distributed in the last 20 entries of the training set.

In Fig. 1, we show the graph of the accuracy of the three algorithms for incremental training. The first row shows the performance for two-step training, and the second row shows the same for three-step training. The graphs readily show that the degradation in performance is heavily dependent on the location of error. The performance of all the algorithms, when errors are distributed uniformly or are clustered in the first training set, is almost the same. However, as the errors move to later training sets, the performance of the algorithm increases significantly. Moreover, although the performance of RBF is better than FFN, which in its turn is better than perceptron, the nature of degradation remains similar for all the algorithms, irrespective of its sophistication.

For the three-step learning, we have also studied the accuracy of the said algorithms when the error is uniformly distributed in two of the three steps. The graph of the accuracy in this scenario is shown in Fig. 2. The performance of the algorithms due to error in two of the three halves also has a similar nature.

5 Conclusion

In this paper, we have numerically studied the accuracy of perceptron, FFN and RBF for two-step and three-step incremental learning in the presence of noisy dataset. We show that the accuracy of the algorithms depends not only on the percentage of error on the training set, but also on the location of the error concentration. In fact, for the same percentage of error, the location plays a significant role in the accuracy of the algorithms. Moreover, we also show that the nature of degradation due to concentrated error is invariant of the number of features in the data. The accuracy obtained from the most basic ANN (perceptron) and more sophisticated ANNs (FFN, RBF) shows that although sophistication makes the algorithm more robust to errors, the nature of the performance degradation due to location of error is similar for all the algorithms. Therefore, the concentration of error is a more acute threat to incremental learning with noisy training data than the percentage of error.

References

Nasrabadi, N.M.: Pattern recognition and machine learning. J. Electron. Imaging 16(4), 049901 (2007)
Article MathSciNet Google Scholar
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: European Conference on Computer Vision, pp. 430–443. Springer (2006)
Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15. Springer (2000)
Google Scholar
Duygulu, P., Barnard, K., Freitas, J., Forsyth, D.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: European Conference on Computer Vision, pp. 97–112. Springer (2002)
Google Scholar
Lei, Xu., Krzyzak, A., Suen, C.Y.: Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. Syst. Man Cybern. 22(3), 418–435 (1992)
Article Google Scholar
Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
Google Scholar
Rebentrost, P., Mohseni, M., Llyod, S.: Quantum support vector machine for big data classification. Phys. Rev. Lett. 113(13), 130503 (2014)
Article Google Scholar
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 161–168. ACM (2006)
Google Scholar
Barlo, H.B.: Unsupervised learning. Neural Comput. 1(3), 295–311 (1989)
Article Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2009)
Article Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Article Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Article Google Scholar
Stephen, I.: Perceptron-based learning algorithms. IEEE Trans. Neural Netw. 50(2), 179 (1990)
Google Scholar
Gurney, K.: An Introduction to Neural Networks. CRC Press (2014)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall PTR (1994)
Google Scholar
Shalev-Shwartz, S., Srebro, N.: Understanding Machine Learning. Cambridge University Press (2014)
Google Scholar
UCI repository of machine learning databases. https://archive.ics.uci.edu/ml/datasets.php

Download references

Author information

Authors and Affiliations

Lexmark International India Pvt. Ltd, Kolkata, India
Shovik Ganguly
Asutosh College, University of Calcutta, Kolkata, India
Atrayee Chatterjee
Advanced Computing and Microelectronics Unit, Indian Statistical Institute, Kolkata, India
Debasmita Bhoumik & Ritajit Majumdar

Authors

Shovik Ganguly
View author publications
You can also search for this author in PubMed Google Scholar
Atrayee Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Debasmita Bhoumik
View author publications
You can also search for this author in PubMed Google Scholar
Ritajit Majumdar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ritajit Majumdar .

Editor information

Editors and Affiliations

Institute of Radio Physics and Electronics, University of Calcutta, Kolkata, India
Nikhil Ranjan Das
Institute of Radio Physics and Electronics, University of Calcutta, Kolkata, India
Santu Sarkar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ganguly, S., Chatterjee, A., Bhoumik, D., Majumdar, R. (2021). An Empirical Study of Incremental Learning in Neural Network with Noisy Training Set. In: Das, N.R., Sarkar, S. (eds) Computers and Devices for Communication. CODEC 2019. Lecture Notes in Networks and Systems, vol 147. Springer, Singapore. https://doi.org/10.1007/978-981-15-8366-7_11

Download citation

DOI: https://doi.org/10.1007/978-981-15-8366-7_11
Published: 04 February 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8365-0
Online ISBN: 978-981-15-8366-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

An Empirical Study of Incremental Learning in Neural Network with Noisy Training Set

Abstract

Similar content being viewed by others

Exact Incremental Learning for a Single Non-linear Neuron Based on Taylor Expansion and Greville Formula

A review of adaptive online learning for artificial neural networks

An Incremental Learning Network Model Based on Random Sample Distribution Fitting

Keywords

1 Introduction

2 Brief Review of the ANN Algorithms Used

3 Performance of ANN in Noisy Two-Step Incremental Learning

4 Performance of ANN in Noisy Three-Step Incremental Learning

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Empirical Study of Incremental Learning in Neural Network with Noisy Training Set

Abstract

Similar content being viewed by others

Exact Incremental Learning for a Single Non-linear Neuron Based on Taylor Expansion and Greville Formula

A review of adaptive online learning for artificial neural networks

An Incremental Learning Network Model Based on Random Sample Distribution Fitting

Keywords

1 Introduction

2 Brief Review of the ANN Algorithms Used

3 Performance of ANN in Noisy Two-Step Incremental Learning

4 Performance of ANN in Noisy Three-Step Incremental Learning

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation