Abstract
Convolutional Neural Networks (CNNs) work well on document image classification tasks, yielding prediction accuracies comparable to state of the art neural networks. In this work, we investigate the performance of CNN architectures namely NasNet Large, InceptionV3 and EfficientNetB3 which are pre-trained on the ImageNet for an efficient document image classification. Beyond that we ensemble these architectures to achieve a superior classification performance. As an ensemble method a simple and effective ensemble strategy called soft voting is utilized. The experiments are conducted on document images which are used in Kocaeli University application system to apply for master degree or undergraduate transfer between programs. The achieved experimental results show that, in terms of F-score, soft voting outperforms CNN architectures by achieving 94.04% even when the training data is limited.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abuelwafa S, Pedersoli M, Cheriet M (2019) Unsupervised exemplar-based learning for improved document image classification. IEEE Access 7:133738–133748
Afzal MZ, Capobianco S, Malik MI, Marinai S, Breuel TM, Dengel A, Liwicki M (2015) Deepdocclassifier: document classification with deep convolutional neural network. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp 1111–1115. https://doi.org/10.1109/ICDAR.2015.7333933
Afzal MZ, Kölsch A, Ahmed S, Liwicki M (2017) Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. CoRR arXiv preprint http://arxiv.org/abs/1704.03557
Aissam J, Mustapha H, Hasbaoui A (2021) An improved document image classification using deep transfer learning and feature reduction. Int J Adv Trends Comput Sci Eng 10:549–557. https://doi.org/10.30534/ijatcse/2021/141022021
Audebert N, Herold C, Slimani K, Vidal C (2019) Multimodal deep networks for text and image-based document classification. arXiv preprint arXiv:1907.06370
Bakkali S, Ming Z, Coustaty M, Rusiñol M (2020) Cross-modal deep networks for document image classification. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2556–2560
Bakkali S, Ming Z, Coustaty M, Rusinol M (2020) Visual and textual deep feature fusion for document image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 562–563
Bakkali S, Ming Z, Coustaty M, Rusiñol M (2021) Eaml: ensemble self-attention-based mutual learning network for document image classification. Int J Document Anal Recogn (IJDAR) 1–18
Cosma A, Ghidoveanu M, Panaitescu-Liess M, Popescu M (2020) Self-supervised representation learning on document images. In: International workshop on document analysis systems. Springer, Heidelberg, pp 103–117
Csurka G (2016) Document image classification, with a specific view on applications of patent images. CoRR http://arxiv.org/abs/1601.03295
Csurka G, Larlus D, Gordo A, Almazán J (2016) What is the right way to represent document images? CoRR abs/1603.01076 http://arxiv.org/abs/1603.01076
Das A, Roy S, Bhattacharya U (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. CoRR abs/1801.09321, http://arxiv.org/abs/1801.09321
Dutta A, Garai A, Biswas S, Das AK (2021) Segmentation of text lines using multi-scale cnn from warped printed and handwritten document images. Int J Document Anal Recogn (IJDAR) 1–15
Fanany MI et al (2017) Handwriting recognition on form document using convolutional neural network and support vector machines (cnn-svm). In: 2017 5th international conference on information and communication technology (ICoIC7). IEEE, pp 1–6
Ferrando J, Domínguez JL, Torres J, García R, García D, Garrido D, Cortada J, Valero M (2020) Improving accuracy and speeding up document image classification through parallel systems. In: International conference on computational science. Springer, Heidelberg, pp 387–400
Han D, Liu Q, Fan W (2018) A new image classification method using cnn transfer learning and web data augmentation. Expert Syst Appl 95:43–56
Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. CoRR abs/1502.07058, http://arxiv.org/abs/1502.07058
Hassanpour M, Malek H (2019) Document image classification using squeezenet convolutional neural network. In: 2019 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp 1–4. https://doi.org/10.1109/ICSPIS48872.2019.9066032
Hua Y, Huang Z, Guo J, Qiu W (2020) Attention-based graph neural network with global context awareness for document understanding. In: Proceedings of the 19th Chinese national conference on computational linguistics, pp 853–862. Chinese Information Processing Society of China, Haikou, China. https://aclanthology.org/2020.ccl-1.79
Jain R, Wigington C (2019) Multimodal document image classification. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp 71–77. https://doi.org/10.1109/ICDAR.2019.00021
Kang L, Kumar J, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for document image classification. In: 2014 22nd international conference on pattern recognition, pp 3168–3172. https://doi.org/10.1109/ICPR.2014.546
Kölsch A, Afzal MZ, Ebbecke M, Liwicki M (2017) Real-time document image classification using deep cnn and extreme learning machines. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 1318–1323. https://doi.org/10.1109/ICDAR.2017.217
Kumar J, Ye P, Doermann D (2014) Structural similarity for document image classification and retrieval. Pattern Recogn Lett 43:119–126
Mahajan K, Sharma M, Vig L (2019) Character keypoint-based homography estimation in scanned documents for efficient information extraction. CoRR abs/1911.05870, http://arxiv.org/abs/1911.05870
Mandivarapu JK, Bunch E, You Q, Fung G (2021) Efficient document image classification using region-based graph neural network. CoRR abs/2106.13802, https://arxiv.org/abs/2106.13802
Mohsenzadegan K, Tavakkoli V, De Silva P, Kolli A, Kyamakya K, Pichler R, Bouwmeester O, Zupan R (2020) A convolutional neural network model for robust classification of document-images under real-world hard conditions. In: Developments of artificial intelligence technologies in computation and robotics: proceedings of the 14th international FLINS conference (FLINS 2020). World Scientific, pp 1023–1030
Nemcová K (2018) Document functional type classification. In: Horák A, Rychlý P., Rambousek A (eds) The 12th workshop on recent advances in Slavonic natural languages processing, RASLAN 2018, Karlova Studanka, Czech Republic, 7–9 Dec 2018. Tribun EU, pp 95–100
Noce L, Gallo I, Zamberletti A, Calefati A (2016) Embedded textual content for document image classification with convolutional neural networks. In: Proceedings of the 2016 ACM symposium on document engineering, pp 165–173
Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layerwise-trained deep convolutional neural networks for document image classification. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp 1273–1278. https://doi.org/10.1109/ICPR.2016.7899812
Şahin S et al (2020) Dijital dokümanların anahtar kelime tabanlı doğrulanması
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114
Tensmeyer C, Martinez TR (2017) Analysis of convolutional neural networks for document image classification. CoRR abs/1708.03273, http://arxiv.org/abs/1708.03273
Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M (2002) Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1192–1200
Yaman D, Eyiokur FI, Ekenel HK (2017) Comparison of convolutional neural network models for document image classification. In: 2017 25th signal processing and communications applications conference (SIU), pp 1–4. https://doi.org/10.1109/SIU.2017.7960562
Zavalishin S, Bout A, Kurilin I, Rychagov M (2017) Document image classification on the basis of layout information. Electronic Imaging 78–86. https://doi.org/10.2352/ISSN.2470-1173.2017.2.VIPC-412
Zhou Q, Wu, H (2018) Nlp at iest 2018: Bilstm-attention and lstm-attention via soft voting in emotion classification. In: Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 189–194
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710
Acknowledgements
This work has been supported by the Kocaeli University Scientific Research and Development Support Program (BAP) in Turkey under project number FBA-2020-2152.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sevim, S., Omurca, S.İ., Ekinci, E. (2023). Improving Accuracy of Document Image Classification Through Soft Voting Ensemble. In: Smart Applications with Advanced Machine Learning and Human-Centred Problem Design. ICAIAME 2021. Engineering Cyber-Physical Systems and Critical Infrastructures, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-031-09753-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-09753-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09752-2
Online ISBN: 978-3-031-09753-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)