Skip to main content

Machine Learning and Deep Learning Models for Big Data Issues

  • Chapter
  • First Online:
Machine Intelligence and Big Data Analytics for Cybersecurity Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 919))

Abstract

The growing interest of digital in our daily life makes Big data essential in many fields. Today, more and more companies and communities are turning to big data management to help decision-making. Understanding and better managing big data makes it possible to collect and analyze relevant information to make predictions. However, vulnerabilities exist at all scales of the big data platforms, including at the data level. Despite the tremendous efforts and resources that have been offered by big data tools and providers, big data platforms remain vulnerable to many existing forms of attacks. Therefore, new kinds of solutions should be provided to strengthen Big data security. Predictive models are offering promising solutions for additional security layers. In this paper, we summarize and discuss contributions helping to protect big data environments using Machine learning and Deep learning. We also regroup the most sensitive security aspects that should be addressed to protect valuable data. All the contributions and dimensions were addressed through a set of security use cases, namely, malware detection, intrusion, anomaly, access control, and data ingestion controls. Furthermore, we provide comparison results of different techniques to show their efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Sabar NR, Yi X, Song A (2018) A bi-objective hyper-heuristic support vector machines for big data cyber-security. IEEE Access 6:10421–10431. https://doi.org/10.1109/ACCESS.2018.2801792

    Article  Google Scholar 

  2. Chhabra GS, Singh VP, Singh M (2018) Cyber forensics framework for big data analytics in IoT environment using machine learning. Multimed Tools Appl. https://doi.org/10.1007/s11042-018-6338-1

    Article  Google Scholar 

  3. Dovom EM, Azmoodeh A, Dehghantanha A, Newton DE, Parizi RM, Karimipour H (2019) Fuzzy pattern tree for edge malware detection and categorization in IoT. J Syst Architect 97:1–7. https://doi.org/10.1016/j.sysarc.2019.01.017

    Article  Google Scholar 

  4. Masabo E, Kaawaase KS, Sansa-Otim J (2018) Big data: deep learning for detecting malware. In: Proceedings of the 2018 international conference on software engineering in Africa, Gothenburg, Sweden, May 2018, pp 20–26. https://doi.org/10.1145/3195528.3195533

  5. Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Venkatraman S (2019) Robust intelligent malware detection using deep learning. IEEE Access 7:46717–46738. https://doi.org/10.1109/ACCESS.2019.2906934

    Article  Google Scholar 

  6. Marco Ramilli Web Corner, Malware Training Sets: a machine learning dataset for everyone. http://marcoramilli.blogspot.it/2016/12/malware-training-sets-machine-learning.html. Accessed 10 Mar 2020

  7. Mulinka P, Casas P (2018) Stream-based machine learning for network security and anomaly detection. In: Proceedings of the 2018 workshop on big data analytics and machine learning for data communication networks, Budapest, Hungary, Aug 2018, pp 1–7. https://doi.org/10.1145/3229607.3229612

  8. Manzoor MA, Morgan Y (2017) Network intrusion detection system using apache storm. Adv Sci Technol Eng Syst J 2(3):812–818

    Article  Google Scholar 

  9. Casas P, Soro F, Vanerio J, Settanni G, D’Alconzo A (2017) Network security and anomaly detection with Big-DAMA, a big data analytics framework. In: 2017 IEEE 6th international conference on cloud networking (CloudNet), Sept 2017, pp 1–7. https://doi.org/10.1109/cloudnet.2017.8071525

  10. Kozik R (2017) Distributed system for botnet traffic analysis and anomaly detection. In: 2017 IEEE international conference on internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData), June 2017, pp 330–335. https://doi.org/10.1109/ithings-greencom-cpscom-smartdata.2017.55

  11. Zhang G, Qiu X, Gao Y (2019) Software defined security architecture with deep learning-based network anomaly detection module. Presented at the 2019 IEEE 11th international conference on communication software and networks, ICCSN 2019, pp 784–788. https://doi.org/10.1109/iccsn.2019.8905304

  12. Al-Jarrah OY, Siddiqui A, Elsalamouny M, Yoo PD, Muhaidat S, Kim K (2014) Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: 2014 IEEE 34th international conference on distributed computing systems workshops (ICDCSW), June 2014, pp 177–181. https://doi.org/10.1109/icdcsw.2014.14

  13. Rathore MM, Ahmad A, Paul A (2016) Real time intrusion detection system for ultra-high-speed big data environments. J Supercomput 72(9):3489–3510. https://doi.org/10.1007/s11227-015-1615-5

    Article  Google Scholar 

  14. Zhang H, Dai S, Li Y, Zhang W (2018) Real-time distributed-random-forest-based network intrusion detection system using Apache spark. In: 2018 IEEE 37th international performance computing and communications conference (IPCCC), Nov 2018, pp 1–7. https://doi.org/10.1109/pccc.2018.8711068

  15. Mylavarapu G, Thomas J, Ashwin Kumar TK (2015) Real-time hybrid intrusion detection system using Apache storm. In: 2015 IEEE 17th international conference on high performance computing and communications, 2015 IEEE 7th international symposium on cyberspace safety and security, and 2015 IEEE 12th international conference on embedded software and systems, Aug 2015, pp 1436–1441. https://doi.org/10.1109/hpcc-css-icess.2015.241

  16. Najada HA, Mahgoub I, Mohammed I (2018) Cyber intrusion prediction and taxonomy system using deep learning and distributed big data processing. In: 2018 IEEE symposium series on computational intelligence (SSCI), Nov 2018, pp 631–638. https://doi.org/10.1109/ssci.2018.8628685

  17. Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Al-Nemrat A, Venkatraman S (2019) Deep learning approach for intelligent intrusion detection system. IEEE Access 7:41525–41550. https://doi.org/10.1109/ACCESS.2019.2895334

    Article  Google Scholar 

  18. Faker O, Dogdu E (2019) Intrusion detection using big data and deep learning techniques. In: Proceedings of the 2019 ACM Southeast conference, Kennesaw, GA, USA, Apr 2019, pp 86–93. https://doi.org/10.1145/3299815.3314439

  19. Hassan MM, Gumaei A, Alsanad A, Alrubaian M, Fortino G (2020) A hybrid deep learning model for efficient intrusion detection in big data environment. Inf Sci 513:386–396. https://doi.org/10.1016/j.ins.2019.10.069

    Article  Google Scholar 

  20. Hashmani MA, Jameel SM, Ibrahim AM, Zaffar M, Raza K (2018) An ensemble approach to big data security (cyber security). Int J Adv Comput Sci Appl (IJACSA) 9(9) (2018). https://doi.org/10.14569/ijacsa.2018.090910

  21. Jensen K, Nguyen HT, Do TV, Årnes A (2017) A big data analytics approach to combat telecommunication vulnerabilities. Cluster Comput 20(3):2363–2374. https://doi.org/10.1007/s10586-017-0811-x

    Article  Google Scholar 

  22. Subroto A, Apriyana A (2019) Cyber risk prediction through social media big data analytics and statistical machine learning. J Big Data 6(1):50. https://doi.org/10.1186/s40537-019-0216-1

    Article  Google Scholar 

  23. Shrestha Chitrakar A, Petrović S (2019) Efficient k-means using triangle inequality on spark for cyber security analytics. In: Proceedings of the ACM international workshop on security and privacy analytics, Richardson, Texas, USA, Mar 2019, pp 37–45. https://doi.org/10.1145/3309182.3309187

  24. Al Jallad K, Aljnidi M, Desouki MS (2019) Big data analysis and distributed deep learning for next-generation intrusion detection system optimization. J Big Data 6(1):88. https://doi.org/10.1186/s40537-019-0248-6

    Article  Google Scholar 

  25. Abeshu A, Chilamkurti N (2018) Deep learning: the frontier for distributed attack detection in fog-to-things computing. IEEE Commun Mag 56(2):169–175. https://doi.org/10.1109/MCOM.2018.1700332

    Article  Google Scholar 

  26. Diro A, Chilamkurti N (2018) Leveraging LSTM networks for attack detection in fog-to-things communications. IEEE Commun Mag 56(9):124–130. https://doi.org/10.1109/MCOM.2018.1701270

    Article  Google Scholar 

  27. Ma J, Saul LK, Savage S, Voelker GM (2009) Identifying suspicious URLs: an application of large-scale online learning. In: Proceedings of the 26th annual international conference on machine learning, Montreal, Quebec, Canada, June 2009, pp 681–688. https://doi.org/10.1145/1553374.1553462

  28. Jensen K (2020) jss7-attack-simulator. https://github.com/polarking/jss7-attack-simulator. Accessed 11 Mar 2020

  29. Chauhan R, Kaur H, Chang V (2020) An optimized integrated framework of big data analytics managing security and privacy in healthcare data. Wirel Pers Commun 1–22. https://doi.org/10.1007/s11277-020-07040-8

  30. Rao PS, Satyanarayana S (2018) Privacy preserving data publishing based on sensitivity in context of Big Data using Hive. J Big Data 5(1):1–20. https://doi.org/10.1186/s40537-018-0130-y

    Article  Google Scholar 

  31. Lv D, Zhu S (2019) Achieving correlated differential privacy of big data publication. Comput Secur 82:184–195. https://doi.org/10.1016/j.cose.2018.12.017

    Article  Google Scholar 

  32. Pan J, Liu Y, Zhang W (2019) Detection of dummy trajectories using convolutional neural networks. Secur Commun Netw 2019. https://doi.org/10.1155/2019/8431074

  33. Andrew J, Karthikeyan J, Jebastin J (2019) Privacy preserving big data publication on cloud using Mondrian anonymization techniques and deep neural networks. In: 2019 5th international conference on advanced computing communication systems (ICACCS), Mar 2019, pp 722–727. https://doi.org/10.1109/icaccs.2019.8728384

  34. Guo M, Pissinou N, Iyengar SS (2019) Privacy-preserving deep learning for enabling big edge data analytics in internet of things. Presented at the 2019 10th international green and sustainable computing conference, IGSC 2019. https://doi.org/10.1109/igsc48788.2019.8957195

  35. Hesamifard E, Takabi H, Ghasemi M (2019) Deep neural networks classification over encrypted data. In: Proceedings of the ninth ACM conference on data and application security and privacy, Richardson, Texas, USA, Mar 2019, pp 97–108. https://doi.org/10.1145/3292006.3300044

  36. Weng J, Weng J, Zhang J, Li M, Zhang Y, Luo W (2019) DeepChain: auditable and privacy-preserving deep learning with blockchain-based incentive. IEEE Trans Dependable Secure Comput 1. https://doi.org/10.1109/tdsc.2019.2952332

  37. beijingair. http://beijingair.sinaapp.com/. Accessed 11 Mar 2020

  38. Saurav S, Schwarz P (2016) A machine-learning approach to automatic detection of delimiters in tabular data files. In: 2016 IEEE 18th international conference on high performance computing and communications; IEEE 14th international conference on smart city; IEEE 2nd international conference on data science and systems (HPCC/SmartCity/DSS), Dec 2016, pp 1501–1503. https://doi.org/10.1109/hpcc-smartcity-dss.2016.0213

  39. Okorafor E et al (2020) Intelligent data ingestion system and method for governance and security. US20200019558A1, Jan 16, 2020

    Google Scholar 

  40. Gong X, Shang L, Wang Z (2016) Real time data ingestion and anomaly detection for particle physics. Capstone project paper, 2016. https://zw1074.github.io/files/FinalReport_TeamXYZ.pdf. Accessed 13 Mar 2020

  41. Ren Y, Zeng Z, Wang T, Zhang S, Zhi G (2020) A trust-based minimum cost and quality aware data collection scheme in P2P network. Peer-to-Peer Netw Appl. https://doi.org/10.1007/s12083-020-00898-2

    Article  Google Scholar 

  42. Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73. https://doi.org/10.1016/j.ins.2013.11.016

    Article  Google Scholar 

  43. van der Walt E, Eloff JHP, Grobler J (2018) Cyber-security: identity deception detection on social media platforms. Comput Secur 78:76–89. https://doi.org/10.1016/j.cose.2018.05.015

    Article  Google Scholar 

  44. Shama SK, Siva Nandini K, Bhavya Anjali P, Devi Manaswi K (2019) DeepProfile: finding fake profile in online social network using dynamic CNN. Int J Recent Technol Eng (IJRTE) 8:11191–11194

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youssef Gahi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gahi, Y., El Alaoui, I. (2021). Machine Learning and Deep Learning Models for Big Data Issues. In: Maleh, Y., Shojafar, M., Alazab, M., Baddi, Y. (eds) Machine Intelligence and Big Data Analytics for Cybersecurity Applications. Studies in Computational Intelligence, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-030-57024-8_2

Download citation

Publish with us

Policies and ethics