Abstract
The growing interest of digital in our daily life makes Big data essential in many fields. Today, more and more companies and communities are turning to big data management to help decision-making. Understanding and better managing big data makes it possible to collect and analyze relevant information to make predictions. However, vulnerabilities exist at all scales of the big data platforms, including at the data level. Despite the tremendous efforts and resources that have been offered by big data tools and providers, big data platforms remain vulnerable to many existing forms of attacks. Therefore, new kinds of solutions should be provided to strengthen Big data security. Predictive models are offering promising solutions for additional security layers. In this paper, we summarize and discuss contributions helping to protect big data environments using Machine learning and Deep learning. We also regroup the most sensitive security aspects that should be addressed to protect valuable data. All the contributions and dimensions were addressed through a set of security use cases, namely, malware detection, intrusion, anomaly, access control, and data ingestion controls. Furthermore, we provide comparison results of different techniques to show their efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sabar NR, Yi X, Song A (2018) A bi-objective hyper-heuristic support vector machines for big data cyber-security. IEEE Access 6:10421–10431. https://doi.org/10.1109/ACCESS.2018.2801792
Chhabra GS, Singh VP, Singh M (2018) Cyber forensics framework for big data analytics in IoT environment using machine learning. Multimed Tools Appl. https://doi.org/10.1007/s11042-018-6338-1
Dovom EM, Azmoodeh A, Dehghantanha A, Newton DE, Parizi RM, Karimipour H (2019) Fuzzy pattern tree for edge malware detection and categorization in IoT. J Syst Architect 97:1–7. https://doi.org/10.1016/j.sysarc.2019.01.017
Masabo E, Kaawaase KS, Sansa-Otim J (2018) Big data: deep learning for detecting malware. In: Proceedings of the 2018 international conference on software engineering in Africa, Gothenburg, Sweden, May 2018, pp 20–26. https://doi.org/10.1145/3195528.3195533
Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Venkatraman S (2019) Robust intelligent malware detection using deep learning. IEEE Access 7:46717–46738. https://doi.org/10.1109/ACCESS.2019.2906934
Marco Ramilli Web Corner, Malware Training Sets: a machine learning dataset for everyone. http://marcoramilli.blogspot.it/2016/12/malware-training-sets-machine-learning.html. Accessed 10 Mar 2020
Mulinka P, Casas P (2018) Stream-based machine learning for network security and anomaly detection. In: Proceedings of the 2018 workshop on big data analytics and machine learning for data communication networks, Budapest, Hungary, Aug 2018, pp 1–7. https://doi.org/10.1145/3229607.3229612
Manzoor MA, Morgan Y (2017) Network intrusion detection system using apache storm. Adv Sci Technol Eng Syst J 2(3):812–818
Casas P, Soro F, Vanerio J, Settanni G, D’Alconzo A (2017) Network security and anomaly detection with Big-DAMA, a big data analytics framework. In: 2017 IEEE 6th international conference on cloud networking (CloudNet), Sept 2017, pp 1–7. https://doi.org/10.1109/cloudnet.2017.8071525
Kozik R (2017) Distributed system for botnet traffic analysis and anomaly detection. In: 2017 IEEE international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData), June 2017, pp 330–335. https://doi.org/10.1109/ithings-greencom-cpscom-smartdata.2017.55
Zhang G, Qiu X, Gao Y (2019) Software defined security architecture with deep learning-based network anomaly detection module. Presented at the 2019 IEEE 11th international conference on communication software and networks, ICCSN 2019, pp 784–788. https://doi.org/10.1109/iccsn.2019.8905304
Al-Jarrah OY, Siddiqui A, Elsalamouny M, Yoo PD, Muhaidat S, Kim K (2014) Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: 2014 IEEE 34th international conference on distributed computing systems workshops (ICDCSW), June 2014, pp 177–181. https://doi.org/10.1109/icdcsw.2014.14
Rathore MM, Ahmad A, Paul A (2016) Real time intrusion detection system for ultra-high-speed big data environments. J Supercomput 72(9):3489–3510. https://doi.org/10.1007/s11227-015-1615-5
Zhang H, Dai S, Li Y, Zhang W (2018) Real-time distributed-random-forest-based network intrusion detection system using Apache spark. In: 2018 IEEE 37th international performance computing and communications conference (IPCCC), Nov 2018, pp 1–7. https://doi.org/10.1109/pccc.2018.8711068
Mylavarapu G, Thomas J, Ashwin Kumar TK (2015) Real-time hybrid intrusion detection system using Apache storm. In: 2015 IEEE 17th international conference on high performance computing and communications, 2015 IEEE 7th international symposium on cyberspace safety and security, and 2015 IEEE 12th international conference on embedded software and systems, Aug 2015, pp 1436–1441. https://doi.org/10.1109/hpcc-css-icess.2015.241
Najada HA, Mahgoub I, Mohammed I (2018) Cyber intrusion prediction and taxonomy system using deep learning and distributed big data processing. In: 2018 IEEE symposium series on computational intelligence (SSCI), Nov 2018, pp 631–638. https://doi.org/10.1109/ssci.2018.8628685
Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Al-Nemrat A, Venkatraman S (2019) Deep learning approach for intelligent intrusion detection system. IEEE Access 7:41525–41550. https://doi.org/10.1109/ACCESS.2019.2895334
Faker O, Dogdu E (2019) Intrusion detection using big data and deep learning techniques. In: Proceedings of the 2019 ACM Southeast conference, Kennesaw, GA, USA, Apr 2019, pp 86–93. https://doi.org/10.1145/3299815.3314439
Hassan MM, Gumaei A, Alsanad A, Alrubaian M, Fortino G (2020) A hybrid deep learning model for efficient intrusion detection in big data environment. Inf Sci 513:386–396. https://doi.org/10.1016/j.ins.2019.10.069
Hashmani MA, Jameel SM, Ibrahim AM, Zaffar M, Raza K (2018) An ensemble approach to big data security (cyber security). Int J Adv Comput Sci Appl (IJACSA) 9(9) (2018). https://doi.org/10.14569/ijacsa.2018.090910
Jensen K, Nguyen HT, Do TV, Årnes A (2017) A big data analytics approach to combat telecommunication vulnerabilities. Cluster Comput 20(3):2363–2374. https://doi.org/10.1007/s10586-017-0811-x
Subroto A, Apriyana A (2019) Cyber risk prediction through social media big data analytics and statistical machine learning. J Big Data 6(1):50. https://doi.org/10.1186/s40537-019-0216-1
Shrestha Chitrakar A, Petrović S (2019) Efficient k-means using triangle inequality on spark for cyber security analytics. In: Proceedings of the ACM international workshop on security and privacy analytics, Richardson, Texas, USA, Mar 2019, pp 37–45. https://doi.org/10.1145/3309182.3309187
Al Jallad K, Aljnidi M, Desouki MS (2019) Big data analysis and distributed deep learning for next-generation intrusion detection system optimization. J Big Data 6(1):88. https://doi.org/10.1186/s40537-019-0248-6
Abeshu A, Chilamkurti N (2018) Deep learning: the frontier for distributed attack detection in fog-to-things computing. IEEE Commun Mag 56(2):169–175. https://doi.org/10.1109/MCOM.2018.1700332
Diro A, Chilamkurti N (2018) Leveraging LSTM networks for attack detection in fog-to-things communications. IEEE Commun Mag 56(9):124–130. https://doi.org/10.1109/MCOM.2018.1701270
Ma J, Saul LK, Savage S, Voelker GM (2009) Identifying suspicious URLs: an application of large-scale online learning. In: Proceedings of the 26th annual international conference on machine learning, Montreal, Quebec, Canada, June 2009, pp 681–688. https://doi.org/10.1145/1553374.1553462
Jensen K (2020) jss7-attack-simulator. https://github.com/polarking/jss7-attack-simulator. Accessed 11 Mar 2020
Chauhan R, Kaur H, Chang V (2020) An optimized integrated framework of big data analytics managing security and privacy in healthcare data. Wirel Pers Commun 1–22. https://doi.org/10.1007/s11277-020-07040-8
Rao PS, Satyanarayana S (2018) Privacy preserving data publishing based on sensitivity in context of Big Data using Hive. J Big Data 5(1):1–20. https://doi.org/10.1186/s40537-018-0130-y
Lv D, Zhu S (2019) Achieving correlated differential privacy of big data publication. Comput Secur 82:184–195. https://doi.org/10.1016/j.cose.2018.12.017
Pan J, Liu Y, Zhang W (2019) Detection of dummy trajectories using convolutional neural networks. Secur Commun Netw 2019. https://doi.org/10.1155/2019/8431074
Andrew J, Karthikeyan J, Jebastin J (2019) Privacy preserving big data publication on cloud using Mondrian anonymization techniques and deep neural networks. In: 2019 5th international conference on advanced computing communication systems (ICACCS), Mar 2019, pp 722–727. https://doi.org/10.1109/icaccs.2019.8728384
Guo M, Pissinou N, Iyengar SS (2019) Privacy-preserving deep learning for enabling big edge data analytics in internet of things. Presented at the 2019 10th international green and sustainable computing conference, IGSC 2019. https://doi.org/10.1109/igsc48788.2019.8957195
Hesamifard E, Takabi H, Ghasemi M (2019) Deep neural networks classification over encrypted data. In: Proceedings of the ninth ACM conference on data and application security and privacy, Richardson, Texas, USA, Mar 2019, pp 97–108. https://doi.org/10.1145/3292006.3300044
Weng J, Weng J, Zhang J, Li M, Zhang Y, Luo W (2019) DeepChain: auditable and privacy-preserving deep learning with blockchain-based incentive. IEEE Trans Dependable Secure Comput 1. https://doi.org/10.1109/tdsc.2019.2952332
beijingair. http://beijingair.sinaapp.com/. Accessed 11 Mar 2020
Saurav S, Schwarz P (2016) A machine-learning approach to automatic detection of delimiters in tabular data files. In: 2016 IEEE 18th international conference on high performance computing and communications; IEEE 14th international conference on smart city; IEEE 2nd international conference on data science and systems (HPCC/SmartCity/DSS), Dec 2016, pp 1501–1503. https://doi.org/10.1109/hpcc-smartcity-dss.2016.0213
Okorafor E et al (2020) Intelligent data ingestion system and method for governance and security. US20200019558A1, Jan 16, 2020
Gong X, Shang L, Wang Z (2016) Real time data ingestion and anomaly detection for particle physics. Capstone project paper, 2016. https://zw1074.github.io/files/FinalReport_TeamXYZ.pdf. Accessed 13 Mar 2020
Ren Y, Zeng Z, Wang T, Zhang S, Zhi G (2020) A trust-based minimum cost and quality aware data collection scheme in P2P network. Peer-to-Peer Netw Appl. https://doi.org/10.1007/s12083-020-00898-2
Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73. https://doi.org/10.1016/j.ins.2013.11.016
van der Walt E, Eloff JHP, Grobler J (2018) Cyber-security: identity deception detection on social media platforms. Comput Secur 78:76–89. https://doi.org/10.1016/j.cose.2018.05.015
Shama SK, Siva Nandini K, Bhavya Anjali P, Devi Manaswi K (2019) DeepProfile: finding fake profile in online social network using dynamic CNN. Int J Recent Technol Eng (IJRTE) 8:11191–11194
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Gahi, Y., El Alaoui, I. (2021). Machine Learning and Deep Learning Models for Big Data Issues. In: Maleh, Y., Shojafar, M., Alazab, M., Baddi, Y. (eds) Machine Intelligence and Big Data Analytics for Cybersecurity Applications. Studies in Computational Intelligence, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-030-57024-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-57024-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57023-1
Online ISBN: 978-3-030-57024-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)