Abstract
In this era of ease of sharing information on the Internet, it has become incredibly easy to share any sort of information online. However, this ease of sharing can come with a great risk of sharing personal or private information, whether knowingly or unknowingly. The potential consequences of compromising information on the Internet can be harmful as it can lead to various forms of online harassment and malpractices. This is why individuals need to be careful about what they share online. A medium is required that can classify the sensitivity of a text to alert the individuals. Many existing approaches classify the text based on the number of sensitive tokens identified. However, this is not enough because these approaches cannot understand the context of the text. In this paper, we proposed a hybrid model leveraging the advantages of CNN, BiLSTM, and multihead attention mechanism, we analyzed the patterns and compared the results provided by standard machine learning and deep learning models, we also discussed the advantages and disadvantages of every model, in extension to do this we also. Our proposed model showed similar to better results than that of the ALBERT model with a significantly much shorter amount of training time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li K, Cheng L, Teng CI (2020) Voluntary sharing and mandatory provision: private information disclosure on social networking sites. Inf Process Manage 57(1):102128
Ani Petrosyan (2023) Worldwide digital population. https://www.statista.com/statistics/617136/digital-population-worldwide/
Stockdale LA, Coyne SM (2020) Bored and online: reasons for using social media, problematic social networking site use, and behavioral outcomes across the transition from adolescence to emerging adulthood. J Adolesc 79:173–183
Ma Q, Song HH, Muthukrishnan S, Nucci A (2016) Joining user profiles across online social networks: from the perspective of an adversary. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), Aug 2016, pp 178–185
Aghasian E, Garg S, Gao L, Yu S, Montgomery J (2017) Scoring users’ privacy disclosure across multiple online social networks. IEEE Access 5:13118–13130
Isaak J, Hanna MJ (2018) User data privacy: Facebook, Cambridge Analytica, and privacy protection. Computer 51(8):56–59
Abouelmehdi K, Beni-Hessane A, Khaloufi H (2018) Big healthcare data: preserving security and privacy. J Big Data 5(1):1–18
Geetha R, Karthika S, Kumaraguru P (2021) Tweet-scan-post: a system for analysis of sensitive private data disclosure in online social media. Knowl Inf Syst 63:2365–2404
Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf Ser 2171(1):012021. IOP Publishing
Chen Y (2015) Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: International conference on machine learning, May 2013, pp 1310–1318. PMLR
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30
Bioglio L, Pensa RG (2022) Analysis and classification of privacy-sensitive content in social media posts. EPJ Data Sci 11(1):12
Trieu LQ, Tran TN, Tran MK, Tran MT (2017) Document sensitivity classification for data leakage prevention with twitter-based document embedding and query expansion. In: 2017 13th international conference on computational intelligence and security (CIS), Dec 2017. IEEE, pp 537–542
Battaglia E, Bioglio L, Pensa RG (2020) Classification-based content sensitivity analysis. In: CEUR workshop proceedings, vol 2646, pp 326–333. CEUR-WS.org
Jin X, Li Y, Mah T, Tong J (2007) Sensitive webpage classification for content advertising. In: Proceedings of the 1st international workshop on data mining and audience intelligence for advertising, Aug 2007, pp 28–33
Sánchez D, Batet M (2016) C-sanitized: a privacy model for document redaction and sanitization. J Assoc Inf Sci Technol 67(1):148–163
Zhou H (2022) Research of text classification based on TF-IDF and CNN-LSTM. J Phys Conf Ser 2171(1):012021. IOP Publishing
Zhang J, Li Y, Tian J, Li T (2018) LSTM-CNN hybrid model for text classification. In: 2018 IEEE 3rd advanced information technology, electronic and automation control conference (IAEAC), Oct 2018. IEEE, pp 1675–1680
Chen X, Ouyang C, Liu Y, Luo L, Yang X (2018) A hybrid deep learning model for text classification. In: 2018 14th international conference on semantics, knowledge and grids (SKG), Sept 2018. IEEE, pp 46–52
Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation
Acknowledgements
We would want to express our gratitude to Ruggero G. Pensa, Ph.D., University of Torino, Italy, for providing us with the dataset.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Puvvadi, H.V., Shyamala L (2024). Sensitive Content Classification. In: Tiwari, S., Trivedi, M.C., Kolhe, M.L., Singh, B.K. (eds) Advances in Data and Information Sciences. ICDIS 2023. Lecture Notes in Networks and Systems, vol 796. Springer, Singapore. https://doi.org/10.1007/978-981-99-6906-7_21
Download citation
DOI: https://doi.org/10.1007/978-981-99-6906-7_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6905-0
Online ISBN: 978-981-99-6906-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)