Spam Filtering System Based on Nearest Neighbor Algorithms

Hnini, Ghizlane; Riffi, Jamal; Mahraz, Mohamed Adnane; Yahyaouy, Ali; Tairi, Hamid

doi:10.1007/978-3-030-53970-2_4

Ghizlane Hnini¹²,
Jamal Riffi¹²,
Mohamed Adnane Mahraz¹²,
Ali Yahyaouy¹² &
…
Hamid Tairi¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 144))

Included in the following conference series:

International Conference on Artificial Intelligence & Industrial Applications

Abstract

In recent years, the email has become the most used way of communication and parcel of our lives due to its efficiency. However, an email is more vulnerable to exploitation, more precisely when we talk about spam. The identification of spam poses challenges. Thus, new algorithms have been investigated lately in order to filter spam. To deal with this problem, we propose a new approach for spam detection based on three Nearest Neighbor (NN) algorithms which are the most simple classifiers in machine learning techniques namely: K-NN, WKNN and K-d tree. To achieve a high performance we pre-processing our emails using some techniques of NLP before extracting features. After that we extract features using Bag-of-words (BOW), N-gram and Term Frequency-Inverse Document Frequency (TF-IDF). In this research paper, we provide a comparison of the three classifiers. The Experimental results have demonstrated that K-NN achieved a high performance based on four measuring factors namely: Precision, Recall, F1-score and Accuracy in both datasets Enron and LingSpam.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Comparative Study of Classification Algorithms for Spam Email Detection

Email Classification Using Supervised Learning Algorithms

The Comparison of Machine Learning Methods for Email Spam Detection

References

Sculley, D., Wachman, G.M.: Relaxed online SVMs for spam filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 415–422 (2007)
Google Scholar
Carreras, X., Marquez, L.: Boosting Trees for Anti-Spam Email Filtering (2001)
Google Scholar
Hoanca, B.: Our Weapons in the Spam Wars? (2006)
Google Scholar
Hans, K.: Das System Kaliumsulfai-Kaliurnsulfid. Zeitschrift für Anorg. und Allg. Chemie 164, 45–56 (1931)
Google Scholar
Zhang, Y., Jin, R., Zhou, Z.H.: Understanding bag-of-words model: a statistical framework. Int. J. Mach. Learn. Cybern. 1(1–4), 43–52 (2010)
Article Google Scholar
Doddington, G.: Automatic evaluation of machine translation quality using n-gram co-occurrence statistics, pp. 138–145 (2001)
Google Scholar
Cover, T.M., Hart, P.E.: Nearest Neighbor Pattern Classification, pp. 1–12 (2018)
Google Scholar
Zuo, W., Zhang, D., Wang, K.: On kernel difference-weighted k-nearest neighbor classification. Patt. Anal. Appl. 11(3–4), 247–257 (2008)
Article MathSciNet Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article MathSciNet Google Scholar
Ramachandran, A., Feamster, N.: Understanding the Network-Level Behavior of Spammers
Google Scholar
Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A Memory-Based Approach to Anti-Spam Filtering A Memory-Based Approach to Anti-Spam Filtering (2003)
Google Scholar
Mccallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification (1997)
Google Scholar
Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol. 3, No. 22, pp. 41–46 (2001)
Google Scholar
Dumais, S., Platt, J., Heckerman, D.: Inductive Learning Algorithms and Representations for Text Categorization
Google Scholar
Quinlan, J.R.: Induction of Decision Trees, pp. 81–106 (2007)
Google Scholar
Trier, Ø.D., Jain, A.K., Taxt, T.: Feature extraction methods for character recognition-a survey. Pattern Recognit. 29(4), 641–662 (1996)
Article Google Scholar
Gou, J., Du, L., Zhang, Y., Xiong, T.: A new distance-weighted k-nearest neighbor classifier. J. Inf. Comput. Sci 9(6), 1429–1436 (2012)
Google Scholar
Hechenbichler, K., Schliep, K.: Weighted k-Nearest-Neighbor Techniques and Ordinal Classification, vol. 399 (2004). Projekt partner Weighted k -Nearest-Neighbor Techniques
Google Scholar
Klimt, B., Yang, Y.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_22
Chapter Google Scholar
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

LIIAN, Faculty of Sciences Dhar El mahraz, Sidi Mohamed Ben Abdelah University, Fez, Morocco
Ghizlane Hnini, Jamal Riffi, Mohamed Adnane Mahraz, Ali Yahyaouy & Hamid Tairi

Authors

Ghizlane Hnini
View author publications
You can also search for this author in PubMed Google Scholar
Jamal Riffi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Adnane Mahraz
View author publications
You can also search for this author in PubMed Google Scholar
Ali Yahyaouy
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Tairi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ghizlane Hnini .

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, National Graduate School for Arts and Crafts, Meknes, Morocco
Tawfik Masrour
Department of Industrial and Manufacturing Engineering, National Graduate School for Arts and Crafts, Meknes, Morocco
Ibtissam El Hassani
Department of Industrial and Manufacturing Engineering, National Graduate School for Arts and Crafts, Meknes, Morocco
Anass Cherrafi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hnini, G., Riffi, J., Mahraz, M.A., Yahyaouy, A., Tairi, H. (2021). Spam Filtering System Based on Nearest Neighbor Algorithms. In: Masrour, T., El Hassani, I., Cherrafi, A. (eds) Artificial Intelligence and Industrial Applications. A2IA 2020. Lecture Notes in Networks and Systems, vol 144. Springer, Cham. https://doi.org/10.1007/978-3-030-53970-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-53970-2_4
Published: 19 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53969-6
Online ISBN: 978-3-030-53970-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Spam Filtering System Based on Nearest Neighbor Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Study of Classification Algorithms for Spam Email Detection

Email Classification Using Supervised Learning Algorithms

The Comparison of Machine Learning Methods for Email Spam Detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Spam Filtering System Based on Nearest Neighbor Algorithms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Study of Classification Algorithms for Spam Email Detection

Email Classification Using Supervised Learning Algorithms

The Comparison of Machine Learning Methods for Email Spam Detection

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation