Machine Learning and Deep Learning Models for Big Data Issues

Gahi, Youssef; El Alaoui, Imane

doi:10.1007/978-3-030-57024-8_2

Youssef Gahi⁶ &
Imane El Alaoui⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 919))

954 Accesses
13 Citations

Abstract

The growing interest of digital in our daily life makes Big data essential in many fields. Today, more and more companies and communities are turning to big data management to help decision-making. Understanding and better managing big data makes it possible to collect and analyze relevant information to make predictions. However, vulnerabilities exist at all scales of the big data platforms, including at the data level. Despite the tremendous efforts and resources that have been offered by big data tools and providers, big data platforms remain vulnerable to many existing forms of attacks. Therefore, new kinds of solutions should be provided to strengthen Big data security. Predictive models are offering promising solutions for additional security layers. In this paper, we summarize and discuss contributions helping to protect big data environments using Machine learning and Deep learning. We also regroup the most sensitive security aspects that should be addressed to protect valuable data. All the contributions and dimensions were addressed through a set of security use cases, namely, malware detection, intrusion, anomaly, access control, and data ingestion controls. Furthermore, we provide comparison results of different techniques to show their efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Privacy-Preserving Framework for Deep Learning Cybersecurity Solutions

Deep Learning Application in Security and Privacy – Theory and Practice: A Position Paper

Research trends in deep learning and machine learning for cloud computing security

Article Open access 02 May 2024

References

Sabar NR, Yi X, Song A (2018) A bi-objective hyper-heuristic support vector machines for big data cyber-security. IEEE Access 6:10421–10431. https://doi.org/10.1109/ACCESS.2018.2801792
Article Google Scholar
Chhabra GS, Singh VP, Singh M (2018) Cyber forensics framework for big data analytics in IoT environment using machine learning. Multimed Tools Appl. https://doi.org/10.1007/s11042-018-6338-1
Article Google Scholar
Dovom EM, Azmoodeh A, Dehghantanha A, Newton DE, Parizi RM, Karimipour H (2019) Fuzzy pattern tree for edge malware detection and categorization in IoT. J Syst Architect 97:1–7. https://doi.org/10.1016/j.sysarc.2019.01.017
Article Google Scholar
Masabo E, Kaawaase KS, Sansa-Otim J (2018) Big data: deep learning for detecting malware. In: Proceedings of the 2018 international conference on software engineering in Africa, Gothenburg, Sweden, May 2018, pp 20–26. https://doi.org/10.1145/3195528.3195533
Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Venkatraman S (2019) Robust intelligent malware detection using deep learning. IEEE Access 7:46717–46738. https://doi.org/10.1109/ACCESS.2019.2906934
Article Google Scholar
Marco Ramilli Web Corner, Malware Training Sets: a machine learning dataset for everyone. http://marcoramilli.blogspot.it/2016/12/malware-training-sets-machine-learning.html. Accessed 10 Mar 2020
Mulinka P, Casas P (2018) Stream-based machine learning for network security and anomaly detection. In: Proceedings of the 2018 workshop on big data analytics and machine learning for data communication networks, Budapest, Hungary, Aug 2018, pp 1–7. https://doi.org/10.1145/3229607.3229612
Manzoor MA, Morgan Y (2017) Network intrusion detection system using apache storm. Adv Sci Technol Eng Syst J 2(3):812–818
Article Google Scholar
Casas P, Soro F, Vanerio J, Settanni G, D’Alconzo A (2017) Network security and anomaly detection with Big-DAMA, a big data analytics framework. In: 2017 IEEE 6th international conference on cloud networking (CloudNet), Sept 2017, pp 1–7. https://doi.org/10.1109/cloudnet.2017.8071525
Kozik R (2017) Distributed system for botnet traffic analysis and anomaly detection. In: 2017 IEEE international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData), June 2017, pp 330–335. https://doi.org/10.1109/ithings-greencom-cpscom-smartdata.2017.55
Zhang G, Qiu X, Gao Y (2019) Software defined security architecture with deep learning-based network anomaly detection module. Presented at the 2019 IEEE 11th international conference on communication software and networks, ICCSN 2019, pp 784–788. https://doi.org/10.1109/iccsn.2019.8905304
Al-Jarrah OY, Siddiqui A, Elsalamouny M, Yoo PD, Muhaidat S, Kim K (2014) Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: 2014 IEEE 34th international conference on distributed computing systems workshops (ICDCSW), June 2014, pp 177–181. https://doi.org/10.1109/icdcsw.2014.14
Rathore MM, Ahmad A, Paul A (2016) Real time intrusion detection system for ultra-high-speed big data environments. J Supercomput 72(9):3489–3510. https://doi.org/10.1007/s11227-015-1615-5
Article Google Scholar
Zhang H, Dai S, Li Y, Zhang W (2018) Real-time distributed-random-forest-based network intrusion detection system using Apache spark. In: 2018 IEEE 37th international performance computing and communications conference (IPCCC), Nov 2018, pp 1–7. https://doi.org/10.1109/pccc.2018.8711068
Mylavarapu G, Thomas J, Ashwin Kumar TK (2015) Real-time hybrid intrusion detection system using Apache storm. In: 2015 IEEE 17th international conference on high performance computing and communications, 2015 IEEE 7th international symposium on cyberspace safety and security, and 2015 IEEE 12th international conference on embedded software and systems, Aug 2015, pp 1436–1441. https://doi.org/10.1109/hpcc-css-icess.2015.241
Najada HA, Mahgoub I, Mohammed I (2018) Cyber intrusion prediction and taxonomy system using deep learning and distributed big data processing. In: 2018 IEEE symposium series on computational intelligence (SSCI), Nov 2018, pp 631–638. https://doi.org/10.1109/ssci.2018.8628685
Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Al-Nemrat A, Venkatraman S (2019) Deep learning approach for intelligent intrusion detection system. IEEE Access 7:41525–41550. https://doi.org/10.1109/ACCESS.2019.2895334
Article Google Scholar
Faker O, Dogdu E (2019) Intrusion detection using big data and deep learning techniques. In: Proceedings of the 2019 ACM Southeast conference, Kennesaw, GA, USA, Apr 2019, pp 86–93. https://doi.org/10.1145/3299815.3314439
Hassan MM, Gumaei A, Alsanad A, Alrubaian M, Fortino G (2020) A hybrid deep learning model for efficient intrusion detection in big data environment. Inf Sci 513:386–396. https://doi.org/10.1016/j.ins.2019.10.069
Article Google Scholar
Hashmani MA, Jameel SM, Ibrahim AM, Zaffar M, Raza K (2018) An ensemble approach to big data security (cyber security). Int J Adv Comput Sci Appl (IJACSA) 9(9) (2018). https://doi.org/10.14569/ijacsa.2018.090910
Jensen K, Nguyen HT, Do TV, Årnes A (2017) A big data analytics approach to combat telecommunication vulnerabilities. Cluster Comput 20(3):2363–2374. https://doi.org/10.1007/s10586-017-0811-x
Article Google Scholar
Subroto A, Apriyana A (2019) Cyber risk prediction through social media big data analytics and statistical machine learning. J Big Data 6(1):50. https://doi.org/10.1186/s40537-019-0216-1
Article Google Scholar
Shrestha Chitrakar A, Petrović S (2019) Efficient k-means using triangle inequality on spark for cyber security analytics. In: Proceedings of the ACM international workshop on security and privacy analytics, Richardson, Texas, USA, Mar 2019, pp 37–45. https://doi.org/10.1145/3309182.3309187
Al Jallad K, Aljnidi M, Desouki MS (2019) Big data analysis and distributed deep learning for next-generation intrusion detection system optimization. J Big Data 6(1):88. https://doi.org/10.1186/s40537-019-0248-6
Article Google Scholar
Abeshu A, Chilamkurti N (2018) Deep learning: the frontier for distributed attack detection in fog-to-things computing. IEEE Commun Mag 56(2):169–175. https://doi.org/10.1109/MCOM.2018.1700332
Article Google Scholar
Diro A, Chilamkurti N (2018) Leveraging LSTM networks for attack detection in fog-to-things communications. IEEE Commun Mag 56(9):124–130. https://doi.org/10.1109/MCOM.2018.1701270
Article Google Scholar
Ma J, Saul LK, Savage S, Voelker GM (2009) Identifying suspicious URLs: an application of large-scale online learning. In: Proceedings of the 26th annual international conference on machine learning, Montreal, Quebec, Canada, June 2009, pp 681–688. https://doi.org/10.1145/1553374.1553462
Jensen K (2020) jss7-attack-simulator. https://github.com/polarking/jss7-attack-simulator. Accessed 11 Mar 2020
Chauhan R, Kaur H, Chang V (2020) An optimized integrated framework of big data analytics managing security and privacy in healthcare data. Wirel Pers Commun 1–22. https://doi.org/10.1007/s11277-020-07040-8
Rao PS, Satyanarayana S (2018) Privacy preserving data publishing based on sensitivity in context of Big Data using Hive. J Big Data 5(1):1–20. https://doi.org/10.1186/s40537-018-0130-y
Article Google Scholar
Lv D, Zhu S (2019) Achieving correlated differential privacy of big data publication. Comput Secur 82:184–195. https://doi.org/10.1016/j.cose.2018.12.017
Article Google Scholar
Pan J, Liu Y, Zhang W (2019) Detection of dummy trajectories using convolutional neural networks. Secur Commun Netw 2019. https://doi.org/10.1155/2019/8431074
Andrew J, Karthikeyan J, Jebastin J (2019) Privacy preserving big data publication on cloud using Mondrian anonymization techniques and deep neural networks. In: 2019 5th international conference on advanced computing communication systems (ICACCS), Mar 2019, pp 722–727. https://doi.org/10.1109/icaccs.2019.8728384
Guo M, Pissinou N, Iyengar SS (2019) Privacy-preserving deep learning for enabling big edge data analytics in internet of things. Presented at the 2019 10th international green and sustainable computing conference, IGSC 2019. https://doi.org/10.1109/igsc48788.2019.8957195
Hesamifard E, Takabi H, Ghasemi M (2019) Deep neural networks classification over encrypted data. In: Proceedings of the ninth ACM conference on data and application security and privacy, Richardson, Texas, USA, Mar 2019, pp 97–108. https://doi.org/10.1145/3292006.3300044
Weng J, Weng J, Zhang J, Li M, Zhang Y, Luo W (2019) DeepChain: auditable and privacy-preserving deep learning with blockchain-based incentive. IEEE Trans Dependable Secure Comput 1. https://doi.org/10.1109/tdsc.2019.2952332
beijingair. http://beijingair.sinaapp.com/. Accessed 11 Mar 2020
Saurav S, Schwarz P (2016) A machine-learning approach to automatic detection of delimiters in tabular data files. In: 2016 IEEE 18th international conference on high performance computing and communications; IEEE 14th international conference on smart city; IEEE 2nd international conference on data science and systems (HPCC/SmartCity/DSS), Dec 2016, pp 1501–1503. https://doi.org/10.1109/hpcc-smartcity-dss.2016.0213
Okorafor E et al (2020) Intelligent data ingestion system and method for governance and security. US20200019558A1, Jan 16, 2020
Google Scholar
Gong X, Shang L, Wang Z (2016) Real time data ingestion and anomaly detection for particle physics. Capstone project paper, 2016. https://zw1074.github.io/files/FinalReport_TeamXYZ.pdf. Accessed 13 Mar 2020
Ren Y, Zeng Z, Wang T, Zhang S, Zhi G (2020) A trust-based minimum cost and quality aware data collection scheme in P2P network. Peer-to-Peer Netw Appl. https://doi.org/10.1007/s12083-020-00898-2
Article Google Scholar
Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73. https://doi.org/10.1016/j.ins.2013.11.016
Article Google Scholar
van der Walt E, Eloff JHP, Grobler J (2018) Cyber-security: identity deception detection on social media platforms. Comput Secur 78:76–89. https://doi.org/10.1016/j.cose.2018.05.015
Article Google Scholar
Shama SK, Siva Nandini K, Bhavya Anjali P, Devi Manaswi K (2019) DeepProfile: finding fake profile in online social network using dynamic CNN. Int J Recent Technol Eng (IJRTE) 8:11191–11194
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Recherche en Sciences de l’Ingénieur, Ibn Tofail University, Kénitra, Morocco
Youssef Gahi
LASTID, Ibn Tofail University, Kénitra, Morocco
Imane El Alaoui

Authors

Youssef Gahi
View author publications
You can also search for this author in PubMed Google Scholar
Imane El Alaoui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youssef Gahi .

Editor information

Editors and Affiliations

Sultan Moulay Slimane University, Beni Mellal, Morocco
Yassine Maleh
Institute for Communication Systems, University of Surrey, Guildford, UK
Mohammad Shojafar
Charles Darwin University, Darwin, NT, Australia
Mamoun Alazab
Chouaib Doukkali University El Jadida, El Jadida, Morocco
Youssef Baddi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gahi, Y., El Alaoui, I. (2021). Machine Learning and Deep Learning Models for Big Data Issues. In: Maleh, Y., Shojafar, M., Alazab, M., Baddi, Y. (eds) Machine Intelligence and Big Data Analytics for Cybersecurity Applications. Studies in Computational Intelligence, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-030-57024-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-57024-8_2
Published: 15 December 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57023-1
Online ISBN: 978-3-030-57024-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Machine Learning and Deep Learning Models for Big Data Issues

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Privacy-Preserving Framework for Deep Learning Cybersecurity Solutions

Deep Learning Application in Security and Privacy – Theory and Practice: A Position Paper

Research trends in deep learning and machine learning for cloud computing security

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Machine Learning and Deep Learning Models for Big Data Issues

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Privacy-Preserving Framework for Deep Learning Cybersecurity Solutions

Deep Learning Application in Security and Privacy – Theory and Practice: A Position Paper

Research trends in deep learning and machine learning for cloud computing security

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation