Abstract
The reliability of the dataset is an essential factor for solving classification problems. Data is required for training, testing, classification, and evaluation of the machine learning models. SMS Phishing (Smishing) is a binary classification problem in which messages are categorized as malicious (Smishing) or legitimate (Ham). It is a fraudulent activity in which the attacker sends a malicious text message to the Smartphone user that causes financial or personal loss to the victim. Few research works have been proposed for the identification of smishing messages. According to the literature survey conducted, the smishing dataset is not publicly available yet. Hence, we have composed a smishing dataset that contains smishing messages extracted from different internet sources. We have formulated a dataset of 5971 text messages that contain 638 smishing messages, 489 spam messages, and 4844 ham messages. This SMS Phishing dataset can be used for the extraction of smishing features and classification of text messages using Machine Learning Algorithms. Experimental evaluation of the dataset for smishing message categorization using keyword classification is also presented in this paper. This smishing dataset can be used as a baseline for future research work corresponding to SMS Phishing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: 11th ACM Symposium on Document Engineering, pp. 259–262 (2011). https://doi.org/10.1145/2034691.2034742
Pinterest: Smishing Dataset (2021). https://in.pinterest.com/seceduau/smishingdataset/?lp=true
Sonowal, G., Kuppusamy, K.S.: SmiDCA: an anti-smishing model with machine learning approach. Comput. J. 61(8), 1143–1157 (2018). https://doi.org/10.1093/comjnl/bxy039
Paytm: Beware of Fraudulent SMS (2021). https://www.paytmbank.com/blog/2020/06/beware-of-fraudulent-sms-calls-about-kyc-suspension-or-expiration-account-block-and-fake-rewards/
Paytm: Fraud Awareness: Stay informed about Phishing! (2021). https://blog.paytm.com/fraud-awareness-paytm-never-asks-for-your-password-otp-2eed50a24ed0 (2017)
MessageMedia: 6 COVID-19 (Coronavirus) SMS scams to look out for (2020). https://messagemedia.com/au/blog/covid-19-coronavirus-sms-scams-to-look-out-for/
Jain, A., Gupta, B.B.: Feature based approach for detection of smishing messages in the mobile environment. J. Inf. Technol. Res. 12, 17–35 (2019). https://doi.org/10.4018/JITR.2019040102
Jain, A.: A novel approach to detect spam and smishing SMS using machine learning techniques. Int. J. E-Serv. Mob. Appl. (2019). https://doi.org/10.4018/IJESMA.2020010102
Sonowal, G.: Detecting phishing SMS based on multiple correlation algorithms. SN Comput. Sci. 1(6), 1–9 (2020). https://doi.org/10.1007/s42979-020-00377-8
Mishra, S., Soni, D.: Smishing detector: a security model to detect smishing through SMS content analysis and URL behavior analysis. Future Gener. Comput. Syst. (2020). https://doi.org/10.1016/j.future.2020.03.021
Mishra, S., Soni, D.: DSmishSMS-a system to detect smishing SMS. Neural Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-06305-y
Mishra, S., Soni, D.: Implementation of ‘smishing detector’: an efficient model for smishing detection using neural network. SN Comput. Sci. 3(3), 1–13 (2022). https://doi.org/10.1007/s42979-022-01078-0
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mishra, S., Soni, D. (2023). SMS Phishing Dataset for Machine Learning and Pattern Recognition. In: Abraham, A., Hanne, T., Gandhi, N., Manghirmalani Mishra, P., Bajaj, A., Siarry, P. (eds) Proceedings of the 14th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2022). SoCPaR 2022. Lecture Notes in Networks and Systems, vol 648. Springer, Cham. https://doi.org/10.1007/978-3-031-27524-1_57
Download citation
DOI: https://doi.org/10.1007/978-3-031-27524-1_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-27523-4
Online ISBN: 978-3-031-27524-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)