Skip to main content

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 648))

Included in the following conference series:

Abstract

The reliability of the dataset is an essential factor for solving classification problems. Data is required for training, testing, classification, and evaluation of the machine learning models. SMS Phishing (Smishing) is a binary classification problem in which messages are categorized as malicious (Smishing) or legitimate (Ham). It is a fraudulent activity in which the attacker sends a malicious text message to the Smartphone user that causes financial or personal loss to the victim. Few research works have been proposed for the identification of smishing messages. According to the literature survey conducted, the smishing dataset is not publicly available yet. Hence, we have composed a smishing dataset that contains smishing messages extracted from different internet sources. We have formulated a dataset of 5971 text messages that contain 638 smishing messages, 489 spam messages, and 4844 ham messages. This SMS Phishing dataset can be used for the extraction of smishing features and classification of text messages using Machine Learning Algorithms. Experimental evaluation of the dataset for smishing message categorization using keyword classification is also presented in this paper. This smishing dataset can be used as a baseline for future research work corresponding to SMS Phishing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Almeida, T.A., Hidalgo, J.M.G., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: 11th ACM Symposium on Document Engineering, pp. 259–262 (2011). https://doi.org/10.1145/2034691.2034742

  2. Pinterest: Smishing Dataset (2021). https://in.pinterest.com/seceduau/smishingdataset/?lp=true

  3. Sonowal, G., Kuppusamy, K.S.: SmiDCA: an anti-smishing model with machine learning approach. Comput. J. 61(8), 1143–1157 (2018). https://doi.org/10.1093/comjnl/bxy039

  4. Paytm: Beware of Fraudulent SMS (2021). https://www.paytmbank.com/blog/2020/06/beware-of-fraudulent-sms-calls-about-kyc-suspension-or-expiration-account-block-and-fake-rewards/

  5. Paytm: Fraud Awareness: Stay informed about Phishing! (2021). https://blog.paytm.com/fraud-awareness-paytm-never-asks-for-your-password-otp-2eed50a24ed0 (2017)

  6. MessageMedia: 6 COVID-19 (Coronavirus) SMS scams to look out for (2020). https://messagemedia.com/au/blog/covid-19-coronavirus-sms-scams-to-look-out-for/

  7. Jain, A., Gupta, B.B.: Feature based approach for detection of smishing messages in the mobile environment. J. Inf. Technol. Res. 12, 17–35 (2019). https://doi.org/10.4018/JITR.2019040102

  8. Jain, A.: A novel approach to detect spam and smishing SMS using machine learning techniques. Int. J. E-Serv. Mob. Appl. (2019). https://doi.org/10.4018/IJESMA.2020010102

    Article  Google Scholar 

  9. Sonowal, G.: Detecting phishing SMS based on multiple correlation algorithms. SN Comput. Sci. 1(6), 1–9 (2020). https://doi.org/10.1007/s42979-020-00377-8

    Article  Google Scholar 

  10. Mishra, S., Soni, D.: Smishing detector: a security model to detect smishing through SMS content analysis and URL behavior analysis. Future Gener. Comput. Syst. (2020). https://doi.org/10.1016/j.future.2020.03.021

    Article  Google Scholar 

  11. Mishra, S., Soni, D.: DSmishSMS-a system to detect smishing SMS. Neural Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-06305-y

  12. Mishra, S., Soni, D.: Implementation of ‘smishing detector’: an efficient model for smishing detection using neural network. SN Comput. Sci. 3(3), 1–13 (2022). https://doi.org/10.1007/s42979-022-01078-0

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandhya Mishra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mishra, S., Soni, D. (2023). SMS Phishing Dataset for Machine Learning and Pattern Recognition. In: Abraham, A., Hanne, T., Gandhi, N., Manghirmalani Mishra, P., Bajaj, A., Siarry, P. (eds) Proceedings of the 14th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2022). SoCPaR 2022. Lecture Notes in Networks and Systems, vol 648. Springer, Cham. https://doi.org/10.1007/978-3-031-27524-1_57

Download citation

Publish with us

Policies and ethics