Abstract
In recent years, the email has become the most used way of communication and parcel of our lives due to its efficiency. However, an email is more vulnerable to exploitation, more precisely when we talk about spam. The identification of spam poses challenges. Thus, new algorithms have been investigated lately in order to filter spam. To deal with this problem, we propose a new approach for spam detection based on three Nearest Neighbor (NN) algorithms which are the most simple classifiers in machine learning techniques namely: K-NN, WKNN and K-d tree. To achieve a high performance we pre-processing our emails using some techniques of NLP before extracting features. After that we extract features using Bag-of-words (BOW), N-gram and Term Frequency-Inverse Document Frequency (TF-IDF). In this research paper, we provide a comparison of the three classifiers. The Experimental results have demonstrated that K-NN achieved a high performance based on four measuring factors namely: Precision, Recall, F1-score and Accuracy in both datasets Enron and LingSpam.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sculley, D., Wachman, G.M.: Relaxed online SVMs for spam filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 415–422 (2007)
Carreras, X., Marquez, L.: Boosting Trees for Anti-Spam Email Filtering (2001)
Hoanca, B.: Our Weapons in the Spam Wars? (2006)
Hans, K.: Das System Kaliumsulfai-Kaliurnsulfid. Zeitschrift für Anorg. und Allg. Chemie 164, 45–56 (1931)
Zhang, Y., Jin, R., Zhou, Z.H.: Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern. 1(1–4), 43–52 (2010)
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics, pp. 138–145 (2001)
Cover, T.M., Hart, P.E.: Nearest Neighbor Pattern Classification, pp. 1–12 (2018)
Zuo, W., Zhang, D., Wang, K.: On kernel difference-weighted k-nearest neighbor classification. Patt. Anal. Appl. 11(3–4), 247–257 (2008)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Ramachandran, A., Feamster, N.: Understanding the Network-Level Behavior of Spammers
Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A Memory-Based Approach to Anti-Spam Filtering A Memory-Based Approach to Anti-Spam Filtering (2003)
Mccallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification (1997)
Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, No. 22, pp. 41–46 (2001)
Dumais, S., Platt, J., Heckerman, D.: Inductive Learning Algorithms and Representations for Text Categorization
Quinlan, J.R.: Induction of Decision Trees, pp. 81–106 (2007)
Trier, Ø.D., Jain, A.K., Taxt, T.: Feature extraction methods for character recognition-a survey. Pattern Recognit. 29(4), 641–662 (1996)
Gou, J., Du, L., Zhang, Y., Xiong, T.: A new distance-weighted k-nearest neighbor classifier. J. Inf. Comput. Sci 9(6), 1429–1436 (2012)
Hechenbichler, K., Schliep, K.: Weighted k-Nearest-Neighbor Techniques and Ordinal Classification, vol. 399 (2004). Projekt partner Weighted k -Nearest-Neighbor Techniques
Klimt, B., Yang, Y.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_22
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hnini, G., Riffi, J., Mahraz, M.A., Yahyaouy, A., Tairi, H. (2021). Spam Filtering System Based on Nearest Neighbor Algorithms. In: Masrour, T., El Hassani, I., Cherrafi, A. (eds) Artificial Intelligence and Industrial Applications. A2IA 2020. Lecture Notes in Networks and Systems, vol 144. Springer, Cham. https://doi.org/10.1007/978-3-030-53970-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-53970-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53969-6
Online ISBN: 978-3-030-53970-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)