Skip to main content

Sensitive Content Classification

  • Conference paper
  • First Online:
Advances in Data and Information Sciences (ICDIS 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 796))

Included in the following conference series:

  • 197 Accesses

Abstract

In this era of ease of sharing information on the Internet, it has become incredibly easy to share any sort of information online. However, this ease of sharing can come with a great risk of sharing personal or private information, whether knowingly or unknowingly. The potential consequences of compromising information on the Internet can be harmful as it can lead to various forms of online harassment and malpractices. This is why individuals need to be careful about what they share online. A medium is required that can classify the sensitivity of a text to alert the individuals. Many existing approaches classify the text based on the number of sensitive tokens identified. However, this is not enough because these approaches cannot understand the context of the text. In this paper, we proposed a hybrid model leveraging the advantages of CNN, BiLSTM, and multihead attention mechanism, we analyzed the patterns and compared the results provided by standard machine learning and deep learning models, we also discussed the advantages and disadvantages of every model, in extension to do this we also. Our proposed model showed similar to better results than that of the ALBERT model with a significantly much shorter amount of training time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Li K, Cheng L, Teng CI (2020) Voluntary sharing and mandatory provision: private information disclosure on social networking sites. Inf Process Manage 57(1):102128

    Google Scholar 

  2. Ani Petrosyan (2023) Worldwide digital population. https://www.statista.com/statistics/617136/digital-population-worldwide/

  3. Stockdale LA, Coyne SM (2020) Bored and online: reasons for using social media, problematic social networking site use, and behavioral outcomes across the transition from adolescence to emerging adulthood. J Adolesc 79:173–183

    Article  Google Scholar 

  4. Ma Q, Song HH, Muthukrishnan S, Nucci A (2016) Joining user profiles across online social networks: from the perspective of an adversary. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), Aug 2016, pp 178–185

    Google Scholar 

  5. Aghasian E, Garg S, Gao L, Yu S, Montgomery J (2017) Scoring users’ privacy disclosure across multiple online social networks. IEEE Access 5:13118–13130

    Article  Google Scholar 

  6. Isaak J, Hanna MJ (2018) User data privacy: Facebook, Cambridge Analytica, and privacy protection. Computer 51(8):56–59

    Article  Google Scholar 

  7. Abouelmehdi K, Beni-Hessane A, Khaloufi H (2018) Big healthcare data: preserving security and privacy. J Big Data 5(1):1–18

    Article  Google Scholar 

  8. Geetha R, Karthika S, Kumaraguru P (2021) Tweet-scan-post: a system for analysis of sensitive private data disclosure in online social media. Knowl Inf Syst 63:2365–2404

    Article  Google Scholar 

  9. Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf Ser 2171(1):012021. IOP Publishing

    Google Scholar 

  10. Chen Y (2015) Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo

    Google Scholar 

  11. Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: International conference on machine learning, May 2013, pp 1310–1318. PMLR

    Google Scholar 

  12. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30

    Google Scholar 

  13. Bioglio L, Pensa RG (2022) Analysis and classification of privacy-sensitive content in social media posts. EPJ Data Sci 11(1):12

    Article  Google Scholar 

  14. Trieu LQ, Tran TN, Tran MK, Tran MT (2017) Document sensitivity classification for data leakage prevention with twitter-based document embedding and query expansion. In: 2017 13th international conference on computational intelligence and security (CIS), Dec 2017. IEEE, pp 537–542

    Google Scholar 

  15. Battaglia E, Bioglio L, Pensa RG (2020) Classification-based content sensitivity analysis. In: CEUR workshop proceedings, vol 2646, pp 326–333. CEUR-WS.org

    Google Scholar 

  16. Jin X, Li Y, Mah T, Tong J (2007) Sensitive webpage classification for content advertising. In: Proceedings of the 1st international workshop on data mining and audience intelligence for advertising, Aug 2007, pp 28–33

    Google Scholar 

  17. Sánchez D, Batet M (2016) C-sanitized: a privacy model for document redaction and sanitization. J Assoc Inf Sci Technol 67(1):148–163

    Google Scholar 

  18. Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf Ser 2171(1):012021. IOP Publishing

    Google Scholar 

  19. Zhang J, Li Y, Tian J, Li T (2018) LSTM-CNN hybrid model for text classification. In: 2018 IEEE 3rd advanced information technology, electronic and automation control conference (IAEAC), Oct 2018. IEEE, pp 1675–1680

    Google Scholar 

  20. Chen X, Ouyang C, Liu Y, Luo L, Yang X (2018) A hybrid deep learning model for text classification. In: 2018 14th international conference on semantics, knowledge and grids (SKG), Sept 2018. IEEE, pp 46–52

    Google Scholar 

  21. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation

    Google Scholar 

Download references

Acknowledgements

We would want to express our gratitude to Ruggero G. Pensa, Ph.D., University of Torino, Italy, for providing us with the dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harsha Vardhan Puvvadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Puvvadi, H.V., Shyamala L (2024). Sensitive Content Classification. In: Tiwari, S., Trivedi, M.C., Kolhe, M.L., Singh, B.K. (eds) Advances in Data and Information Sciences. ICDIS 2023. Lecture Notes in Networks and Systems, vol 796. Springer, Singapore. https://doi.org/10.1007/978-981-99-6906-7_21

Download citation

Publish with us

Policies and ethics