Sensitive Content Classification

Puvvadi, Harsha Vardhan; Shyamala L

doi:10.1007/978-981-99-6906-7_21

Harsha Vardhan Puvvadi¹³ &
Shyamala L¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 796))

Included in the following conference series:

International Conference on Data & Information Sciences

197 Accesses

Abstract

In this era of ease of sharing information on the Internet, it has become incredibly easy to share any sort of information online. However, this ease of sharing can come with a great risk of sharing personal or private information, whether knowingly or unknowingly. The potential consequences of compromising information on the Internet can be harmful as it can lead to various forms of online harassment and malpractices. This is why individuals need to be careful about what they share online. A medium is required that can classify the sensitivity of a text to alert the individuals. Many existing approaches classify the text based on the number of sensitive tokens identified. However, this is not enough because these approaches cannot understand the context of the text. In this paper, we proposed a hybrid model leveraging the advantages of CNN, BiLSTM, and multihead attention mechanism, we analyzed the patterns and compared the results provided by standard machine learning and deep learning models, we also discussed the advantages and disadvantages of every model, in extension to do this we also. Our proposed model showed similar to better results than that of the ALBERT model with a significantly much shorter amount of training time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Classification of Chinese Sensitive Information Based on BERT-CNN

Research on sensitive content detection in social networks

Article 01 August 2019

CTrL-FND: content-based transfer learning approach for fake news detection on social media

Article 10 March 2023

References

Li K, Cheng L, Teng CI (2020) Voluntary sharing and mandatory provision: private information disclosure on social networking sites. Inf Process Manage 57(1):102128
Google Scholar
Ani Petrosyan (2023) Worldwide digital population. https://www.statista.com/statistics/617136/digital-population-worldwide/
Stockdale LA, Coyne SM (2020) Bored and online: reasons for using social media, problematic social networking site use, and behavioral outcomes across the transition from adolescence to emerging adulthood. J Adolesc 79:173–183
Article Google Scholar
Ma Q, Song HH, Muthukrishnan S, Nucci A (2016) Joining user profiles across online social networks: from the perspective of an adversary. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), Aug 2016, pp 178–185
Google Scholar
Aghasian E, Garg S, Gao L, Yu S, Montgomery J (2017) Scoring users’ privacy disclosure across multiple online social networks. IEEE Access 5:13118–13130
Article Google Scholar
Isaak J, Hanna MJ (2018) User data privacy: Facebook, Cambridge Analytica, and privacy protection. Computer 51(8):56–59
Article Google Scholar
Abouelmehdi K, Beni-Hessane A, Khaloufi H (2018) Big healthcare data: preserving security and privacy. J Big Data 5(1):1–18
Article Google Scholar
Geetha R, Karthika S, Kumaraguru P (2021) Tweet-scan-post: a system for analysis of sensitive private data disclosure in online social media. Knowl Inf Syst 63:2365–2404
Article Google Scholar
Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf Ser 2171(1):012021. IOP Publishing
Google Scholar
Chen Y (2015) Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo
Google Scholar
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: International conference on machine learning, May 2013, pp 1310–1318. PMLR
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30
Google Scholar
Bioglio L, Pensa RG (2022) Analysis and classification of privacy-sensitive content in social media posts. EPJ Data Sci 11(1):12
Article Google Scholar
Trieu LQ, Tran TN, Tran MK, Tran MT (2017) Document sensitivity classification for data leakage prevention with twitter-based document embedding and query expansion. In: 2017 13th international conference on computational intelligence and security (CIS), Dec 2017. IEEE, pp 537–542
Google Scholar
Battaglia E, Bioglio L, Pensa RG (2020) Classification-based content sensitivity analysis. In: CEUR workshop proceedings, vol 2646, pp 326–333. CEUR-WS.org
Google Scholar
Jin X, Li Y, Mah T, Tong J (2007) Sensitive webpage classification for content advertising. In: Proceedings of the 1st international workshop on data mining and audience intelligence for advertising, Aug 2007, pp 28–33
Google Scholar
Sánchez D, Batet M (2016) C-sanitized: a privacy model for document redaction and sanitization. J Assoc Inf Sci Technol 67(1):148–163
Google Scholar
Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf Ser 2171(1):012021. IOP Publishing
Google Scholar
Zhang J, Li Y, Tian J, Li T (2018) LSTM-CNN hybrid model for text classification. In: 2018 IEEE 3rd advanced information technology, electronic and automation control conference (IAEAC), Oct 2018. IEEE, pp 1675–1680
Google Scholar
Chen X, Ouyang C, Liu Y, Luo L, Yang X (2018) A hybrid deep learning model for text classification. In: 2018 14th international conference on semantics, knowledge and grids (SKG), Sept 2018. IEEE, pp 46–52
Google Scholar
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation
Google Scholar

Download references

Acknowledgements

We would want to express our gratitude to Ruggero G. Pensa, Ph.D., University of Torino, Italy, for providing us with the dataset.

Author information

Authors and Affiliations

Vellore Institute of Technology Chennai Campus, Chennai, Tamil Nadu, India
Harsha Vardhan Puvvadi & Shyamala L

Authors

Harsha Vardhan Puvvadi
View author publications
You can also search for this author in PubMed Google Scholar
Shyamala L
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harsha Vardhan Puvvadi .

Editor information

Editors and Affiliations

KIET Group of Institutions, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Department of Computer Science and Engineering, National Institute of Technology Agartala, Tripura, India
Munesh C. Trivedi
Faculty of Engineering and Science, University of Agder, Kristiansand, Norway
Mohan L. Kolhe
Department of Computer Science and Engineering, R. B. S. Engineering Technical Campus, Bichpuri, Agra, Uttar Pradesh, India
Brajesh Kumar Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Puvvadi, H.V., Shyamala L (2024). Sensitive Content Classification. In: Tiwari, S., Trivedi, M.C., Kolhe, M.L., Singh, B.K. (eds) Advances in Data and Information Sciences. ICDIS 2023. Lecture Notes in Networks and Systems, vol 796. Springer, Singapore. https://doi.org/10.1007/978-981-99-6906-7_21

Download citation

DOI: https://doi.org/10.1007/978-981-99-6906-7_21
Published: 03 January 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6905-0
Online ISBN: 978-981-99-6906-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Sensitive Content Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

The Classification of Chinese Sensitive Information Based on BERT-CNN

Research on sensitive content detection in social networks

CTrL-FND: content-based transfer learning approach for fake news detection on social media

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Sensitive Content Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

The Classification of Chinese Sensitive Information Based on BERT-CNN

Research on sensitive content detection in social networks

CTrL-FND: content-based transfer learning approach for fake news detection on social media

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation