Abstract
In this paper, we explore the utility of Local Binary Pattern (LBP) descriptors and variance measure towards the development of efficient techniques in order to segment a large collection of historical machine printed document pages. The result of segmentation will help us to organize the document pages in a structural format, which is useful in many applications like historical document access. In our experiments, three basic reference models namely background, text and image models are used to segment various non-text information together with the text. The method is tested on an archive of Portuguese historical documents and shows promising results.
Chapter PDF
Similar content being viewed by others
References
Biblioteca Nacional De Portugal, http://purl.pt/index/geral/PT/index.html
Baird, H.: Digital libraries and document image analysis. In: Proc. of the 7th ICDAR, pp. 2–14 (2003)
Etemad, K., Doermann, D., Chellappa, R.: Multiscale segmentation of unstructured document pages using soft decision integration. Trans. on. IEEE 19(1), 92–96 (1997)
Fletcher, L.A., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. Trans. on. IEEE 10(6), 910–918 (1988)
Gorman, L.O.: The document spectrum for page layout analysis. Trans. on. IEEE 15(11), 1162–1173 (1993)
Jain, A., Bhattacharjee, S.: Text segmentation using gabor filters for automatic document processing. Machine Vision Appl. 5, 169–184 (1992)
Kim, K.I., Jung, K., Kim, J.H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. Trans. on. IEEE 25(12), 1631–1639 (2003)
Nagy, G., Seth, S.C., Stoddard, S.D.: Document analysis with an expert system. In: Pattern Recognition in Practice II, pp. 149–155. Elsevier Science, New York (1986)
Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rotation invariant texture classi cation with local binary patterns. Trans. on. IEEE 24(7), 971–987 (2002)
Wong, K.Y., Casey, R.G., Wahl, F.M.: Document analysis system. IBM J. Res. Development 6, 456–642 (1982)
Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: An automatic system to detect and recognize text in images. Trans. on. IEEE 21(11), 1224–1228 (1999)
Zheng, Y., Li, H., Doermann, D.: Machine printed text and identification in noisy document images. Trans. on. IEEE 26(3), 337–353 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhowmik, T.K., Kar, M. (2013). Text Localization in Historical Document Images with Local Binary Patterns and Variance Models. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2013. Lecture Notes in Computer Science, vol 8251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45062-4_69
Download citation
DOI: https://doi.org/10.1007/978-3-642-45062-4_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45061-7
Online ISBN: 978-3-642-45062-4
eBook Packages: Computer ScienceComputer Science (R0)