Combination of Neural Networks for Multi-label Document Classification

Lenc, Ladislav; Král, Pavel

doi:10.1007/978-3-319-59569-6_34

Ladislav Lenc^17,18 &
Pavel Král^17,18

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10260))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

1921 Accesses
2 Citations

Abstract

This paper deals with multi-label classification of Czech documents using several combinations of neural networks. It is motivated by the assumption that different nets can keep some complementary information and that it should be useful to combine them. The main contribution of this paper consists in a comparison of several combination approaches to improve the results of the individual neural nets. We experimentally show that the results of all the combination approaches outperform the individual nets, however they are comparable. However, the best combination method is the supervised one which uses a feed-forward neural net with sigmoid activation function.

This work has been supported by the project LO1506 of the Czech Ministry of Education, Youth and Sports.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Two-Level Neural Network for Multi-label Document Classification

Deep Neural Networks for Czech Multi-label Document Classification

Neural Networks for Multi-lingual Multi-label Document Classification

Keywords

1 Introduction

This paper deals with multi-label document classification by neural networks. Formally, this task can be seen as the problem of finding a model M which assigns a document \(d \in D\) a set of appropriate labels \(l \in L\) as follows \(M: d \rightarrow l\) where D is the set of all documents and L is the set of all possible document labels. In our previous work [1], we have compared standard feed-forward networks (i.e. multi-layer perceptron) and popular convolutional networks (CNNs).

The resulting F-measures of these nets were high, however these values are still far from perfect. Therefore, in this paper, we use several approaches to combine individual networks in order to improve the final classification score. The main contribution of this paper thus consists in a comparison of classifier combination methods for multi-label classification which has, to the best of our knowledge, never been done on this task before. The methods are evaluated on the documents in Czech language, being a representative of highly inflectional Slavic language with a free word order. These properties decrease the performance of usual methods and therefore, a more sophisticated parametrization is beneficial. This evaluation is another contribution of this paper.

The rest of the paper is organized as follows. Section 2 describes the combination methods. Section 3 deals with experiments realized on the ČTK corpus and then discusses the obtained results. In the last section, we conclude the experimental results and propose some future research directions.

2 Networks and Combination Approaches

2.1 Individual Nets

We use a feed-forward deep neural network (FDNN) and a convolutional neural net (CNN) with two different activation functions, namely sigmoid and softmax, in the output layer. Our CNN is motivated by Kim [2], however we used only one-dimensional convolutional kernel. The topologies of our nets are detailed in our previous work [1].

2.2 Combination

We consider that the different nets keep some complementary information which can compensate recognition errors. We also assume that similar network topology with different activation functions can bring some different information and thus that all nets should have its particular impact on the final classification. Therefore, we consider all the nets as the different classifiers which will be further combined.

Two types of combination will be evaluated and compared. The first group does not need any training phase, while the second one learns a classifier.

Unsupervised Combination. The first combination method compensates the errors of individual classifiers by computing the average value from the inputs. This value is thresholded subsequently to obtain the final classification result. This method is called hereafter Averaged thresholding.

The second combination approach first thresholds the scores of all individual classifiers. Then, the final classification output is given as an agreement of the majority of the classifiers. We call further this method as Majority voting with thresholding.

Supervised Combination. We use another neural network of type multi-layer perceptron to combine the results. This network has three layers: \( n \times 37\) inputs, hidden layer with 512 nodes and the output layer composed of 37 neurons (number of categories to classify). n value is the number of the nets to combine. This configuration was set experimentally on the preliminary results. We also evaluate and compare, as in the case of individual classifiers, two different activation functions: sigmoid and softmax. These combination approaches are hereafter called FNN with sigmoid and FNN with softmax.

3 Experiments

3.1 Tools and Corpus

For implementation of all neural-nets we used Keras tool-kit [3] which is based on the Theano deep learning library [4].

For the following experiments we used the Czech text documents provided by the ČTK. This whole corpus contains 2,974,040 words belonging to 11,955 documents. The documents are annotated from a set of 60 categories as for instance agriculture, weather, politics or sport out of which we used 37 most frequent ones. We have further created the development set which is composed of 500 randomly chosen samples removed from the entire corpus. This corpus is freely available for research purposes at http://home.zcu.cz/~pkral/sw/.

We use the five-folds cross validation procedure for all following experiments, where 20% of the corpus is reserved for testing and the remaining part for training of our models. The optimal value of the threshold is determined on the development set. For evaluation of the multi-label document classification results, we use the standard recall, precision and F-measure (F1) metrics. The results are micro-averaged.

3.2 Results of the Individual Networks

The first experiment (see Sect. 1 of Table 1) shows the results of the individual neural nets with sigmoid and softmax activation functions. These results demonstrates very good classification performance of all individual networks.

Table 1. Experimental results

Full size table

3.3 Results of Unsupervised Combinations

The second experiment shows (see Sect. 2 of Table 1) the results of Averaged thresholding and Majority voting with thresholding methods. These results confirm our assumption that the different nets keep complementary information and that it is useful to combine them to improve classification scores of the individuals networks. These results further show that the performance of both methods are comparable.

Note that due to the space limit, only the best performing combination for each method is reported in this table.

3.4 Results of Supervised Combinations

The following experiments show the results of the supervised combination method with an FNN (see Sect. 2.2). We have evaluated and compared the nets with both sigmoid and softmax (see Sect. 3 of Table 1) activation functions.

These results show that these combinations have also positive impact on the classification and that sigmoid activation function brings better results than softmax. Moreover, as supposed, this supervised combination slightly outperforms both previously described unsupervised methods.

4 Conclusions and Future Work

In this paper, we have used several combination methods to improve the results of individual neural nets for multi-label document classification of Czech text documents. We have shown that it is useful to combine the nets to improve the classification score of the individual networks. We have also proved that the thresholding is a good method to assign the document labels of multi-label classification. We have further shown that the results of all the approaches are comparable. However, the best combination method is the supervised one which uses an FNN with sigmoid activation function. The F-measure of this approach is 85.3%.

We further analyzed the final results and discovered that the classification should be still improved if the number of classes is known for every document. Therefore, the first perspective is to build a meta-classifier to provide this information. The consecutive multi-label classification will be using the class dependent thresholds. The next perspective consists in proposing a novel combination method based on deep neural network. The main challenge of this work will be to found an optimal network topology with a reasonable number of parameters to avoid the overfitting. We also would like to experiment with confidence measures to improve the final classification results.

References

Lenc, L., Král, P.: Deep neural networks for Czech multi-label document classification. In: 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2016), Konya, Turkey. Springer (2016)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification (2014). arXiv preprint arXiv:1408.5882
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), Austin, TX, vol. 4 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Applied Sciences, Department of Computer Science and Engineering, University of West Bohemia, Plzeň, Czech Republic
Ladislav Lenc & Pavel Král
Faculty of Applied Sciences, NTIS - New Technologies for the Information Society, University of West Bohemia, Plzeň, Czech Republic
Ladislav Lenc & Pavel Král

Authors

Ladislav Lenc
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Král
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ladislav Lenc or Pavel Král .

Editor information

Editors and Affiliations

Erasmus University Rotterdam, Rotterdam, The Netherlands
Flavius Frasincar
University of Liège , Liège, Belgium
Ashwin Ittoo
Japan Advanced Institute of Science and Technology, Nomi, Japan
Le Minh Nguyen
Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lenc, L., Král, P. (2017). Combination of Neural Networks for Multi-label Document Classification. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-59569-6_34
Published: 02 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59568-9
Online ISBN: 978-3-319-59569-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Combination of Neural Networks for Multi-label Document Classification

Abstract

Similar content being viewed by others

Two-Level Neural Network for Multi-label Document Classification

Deep Neural Networks for Czech Multi-label Document Classification

Neural Networks for Multi-lingual Multi-label Document Classification

Keywords

1 Introduction

2 Networks and Combination Approaches

2.1 Individual Nets

2.2 Combination

3 Experiments

3.1 Tools and Corpus

3.2 Results of the Individual Networks

3.3 Results of Unsupervised Combinations

3.4 Results of Supervised Combinations

4 Conclusions and Future Work

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Combination of Neural Networks for Multi-label Document Classification

Abstract

Similar content being viewed by others

Two-Level Neural Network for Multi-label Document Classification

Deep Neural Networks for Czech Multi-label Document Classification

Neural Networks for Multi-lingual Multi-label Document Classification

Keywords

1 Introduction

2 Networks and Combination Approaches

2.1 Individual Nets

2.2 Combination

3 Experiments

3.1 Tools and Corpus

3.2 Results of the Individual Networks

3.3 Results of Unsupervised Combinations

3.4 Results of Supervised Combinations

4 Conclusions and Future Work

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation