Abstract
Spammers are constantly evolving new spam technologies, the latest of which is image spam. Till now research in spam image identification has been addressed by considering properties like colour, size, compressibility, entropy, content etc. However, we feel the methods of identification so evolved have certain limitations due to embedded obfuscation like complex backgrounds, compression artifacts and wide variety of fonts and formats .To overcome these limitations, we have proposed a 4-stage methodology which uses the information of low level features and content of the spam images. The method works on images with and without noise separately. Also colour properties of the images are altered so that OCR (Optical Character Recognition) can easily read the text embedded in the image. The proposed method is tested on a dataset of 1984 spam images and is found to be effective in identifying all types of spam images having (1) only text, (2) only images or (3) both text and images. The encouraging experimental results show that the technique achieves an accuracy of 92%.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Fumera, G., Pillai, I., Roli, F.: Spam Filtering Based On The Analysis Of Text Information Embedded Into Image. Journal of Machine Learning Research (JMLR) 7, 2699–2720 (2006)
Apache.org. The apache spamassassin project (2011), http://spamassassin.apache.org/index.html (last accessed May 3, 2011)
Uemura, M., Tabata, T.: Design and Evaluation of a Bayesian-filter-based Image Spam Filtering Method. In: Proceedings of the 2nd International Conference on Information Security and Assurance (ISA 2008), Busan, Korea, April 24-26, pp. 46–51 (2008)
Yan, G., Ming, Y., Xiaonan, Z., Pardo, B., Ying, W., Pappas, T.N., Choudhary, A.: Image Spam Hunter. In: Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), Las Vegas, Nevada, USA, March 30-April 4, pp. 1765–1768 (2008)
Wang, C., Zhang, F., Li, F., Liu, Q.: Image Spam Classification based on Low Level Image Features. In: Proceeding of the 8th International Conference on Communications, Circuits and Systems (ICCCAS 2010), Chengdu China, July 28-30, pp. 290–293 (2010)
Klangpraphant, P., Bhattarakosol, P.: PIMSI: A Partial Image Spam Inspector. In: Proceeding of the 5th International Conference on Future Information Technology (FutureTech), Busan, South Korea, May 21-23, pp. 1–6 (2010)
Hsia, J.H., Chen, M.S.: Language-Model-based Detection Cascade for Efficient Classification of Image-based Spam e-mail. In: Proceeding of the International Conference on Multimedia and Expo (ICME 2009), New York, USA, June 28-July 3, pp. 1182–1185 (2009)
Soranamageswari, M., Meena, C.: Statistical Feature Extraction for Classification of Image Spam Using Artificial Neural Networks. In: Proceeding of the 2nd International Conference on Machine Learning and Computing (ICMLC 2010), Bangalore, India, February 9-11, pp. 101–105 (2010)
Mathworks The Matlab image processing toolbox.M, http://www.mathworks.com/access/helpdesk/help/toolbox/images/ (downloaded on July 10)
Bag of Visual words Model: Recognizing Object Categories, http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf
Image editor, http://www.lunapic.com/editor/?action=contrast (downloaded on July 10, 2011)
Image spam dataset, http://www.cs.jhu.edu/~mdredze/datasets/image_spam/ (downloaded on June 3, 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Gupta, A., Singhal, C., Aggarwal, S. (2012). An Improved Anti Spam Filter Based on Content, Low Level Features and Noise. In: Meghanathan, N., Chaki, N., Nagamalai, D. (eds) Advances in Computer Science and Information Technology. Networks and Communications. CCSIT 2012. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 84. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27299-8_59
Download citation
DOI: https://doi.org/10.1007/978-3-642-27299-8_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27298-1
Online ISBN: 978-3-642-27299-8
eBook Packages: Computer ScienceComputer Science (R0)