The application of neural network for software vulnerability detection: a review

Zhu, Yuhui; Lin, Guanjun; Song, Lipeng; Zhang, Jun

doi:10.1007/s00521-022-08046-y

The application of neural network for software vulnerability detection: a review

Review
Published: 27 November 2022

Volume 35, pages 1279–1301, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

The application of neural network for software vulnerability detection: a review

Download PDF

Yuhui Zhu^1,3^na1,
Guanjun Lin²^na1,
Lipeng Song ORCID: orcid.org/0000-0003-2135-0221³ &
…
Jun Zhang⁴

1224 Accesses
5 Citations
Explore all metrics

Abstract

To date, being benefited from the ability of automated feature extraction and the performance of software vulnerability identification, deep learning techniques have attracted extensive attention in data-driven software vulnerability detection. Many methods based on deep learning have been proposed to speed up and intelligentize the process of vulnerability identification. Although these methods have shown significant advantages over traditional machine learning ones, there is an apparent gap between the deep learning-based detection systems and human experts in understanding potentially vulnerable code semantics. In some real-world vulnerability prediction scenarios, the performance of deep learning-based methods drops by more than 50% compared to these methods’ performance in experimental scenarios. We define this phenomenon as the perception gap by examining and reviewing the early software vulnerability detection approaches. Then, from the perspective of the perception gap, this paper profoundly explores the current software vulnerability detection methods and how existing solutions endeavor to narrow the perception gap and push forward the development of the field of interest. Finally, we summarize the challenges of this new field and discuss the possible future.

Applying Deep Learning for Discovery and Analysis of Software Vulnerabilities: A Brief Survey

Vulnerability Detection with Representation Learning

The Efficiency of Vulnerability Detection Based on Deep Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the popularity of mobile devices and computer networks, software systems have played a critical role in all aspects of our society. Meanwhile, software vulnerabilities arising from software significantly impact businesses and people’s lives [1, 2]. A recent study has pointed out that the Internet suffered from nearly 800 million malware attacks in the second quarter of 2018, which reached a high record [3]. Moreover, most of the attacks can be attributed to vulnerabilities in software. Additionally, the number of vulnerabilities reported publicly to the Common Vulnerabilities and Exposures database (CVE) has increased annually, with the number reported in 2021 hitting 20,000.

Identifying vulnerabilities before deploying software is an effective solution to reduce potential losses caused by malicious attacks [4]. To identify vulnerabilities effectively, researchers have proposed many detection methods which can be categorized into static, dynamic, and hybrid techniques [5]. Static techniques, such as rule/template-based analysis [6], static symbolic execution [7, 8], and code similarity detection [9,10,11], analyze given programs based on source code, and the high false-positive rate is a significant limitation of these techniques. Dynamic techniques analyze given programs by generating specific input data, often accompanied by low code coverage [12]. Finally, a given program is analyzed with a mixture of static and dynamic techniques in hybrid techniques [5]. However, they also suffer from the limitations of both approaches [13]. These methods effectively improve the efficiency of vulnerability detection to a certain extent. However, due to the significant growth of software codes in size and complexity, these solutions fail to satisfy the increasing need for more efficient and effective detection due to the high demand for manual analysis [14].

In order to improve the efficiency and effectiveness of vulnerability detection techniques, many pattern recognition and machine learning (ML) techniques have been widely used to build defect prediction models [15,16,17]. Based on pioneer studies, researchers have selected source-code-based features such as function call [11], software complexity measurement, and code change [18] as indicators to predict the vulnerable code fragments based on ML approaches. However, ML-based techniques still require experts to define indicators explicitly [19, 20]. Furthermore, it is difficult to reflect on the complex and variable vulnerability patterns and discover new vulnerabilities using these indicators.

The emerging deep learning (DL) approaches offer new potential for software vulnerability detection (SVD). On the one hand, DL approaches could extract high-level features automatically, relieving experts from tedious feature engineering tasks [21]. On the other hand, the DL approaches usually have better generalization abilities and can improve detection performance. [22, 23]. It could discover latent features that a human expert might never consider including and represent them in high-dimensional space [24, 25]. Therefore, DL has found its applications in SVD, and the DL-based SVD has become a promising field.

Researchers’ goal is to make the vulnerability detection system like an experienced expert to judge whether a piece of code is vulnerable so that developers can be assisted in identifying and fixing vulnerabilities more efficiently. The SVD methods based on DL are capable of reasoning and understanding code semantics, which shows the possibility of achieving this goal. Researchers are presently pursuing the potential of the DL approach to increase the accuracy of SVD, as indicated by the growing number of scholarly articles (see Fig. 1). The success of DL for SVD expresses the need for an inclusive review of the literature for successive researchers to continue to contribute to this promising field.

Table 1 Summary of related survey

The application of neural network for software vulnerability detection: a review

Abstract

Similar content being viewed by others

Applying Deep Learning for Discovery and Analysis of Software Vulnerabilities: A Brief Survey

Vulnerability Detection with Representation Learning

The Efficiency of Vulnerability Detection Based on Deep Learning

Explore related subjects

1 Introduction

2 The gap between human understanding and vulnerability detection systems

2.1 The dilemma and potential of vulnerability detection systems

2.2 Preliminary researches in DL-based SVD

2.3 Vulnerability perception gap

3 DL-based SVD for bridging the perception gap

3.1 Human experience facilitating DL-based SVD

3.2 Improvements in the quality of data sets

3.3 More suitable feature representation methods

3.4 Neural network with improved learning ability

3.5 Optimization for specific scenarios

4 Challenges and future directions

4.1 The lack of large-scale real-world benchmark data sets

4.2 Effective code representations

4.3 Humanoid DL model

4.4 Semantic retention in neural networks

4.5 Vulnerability detection in the cross-environment

5 Conclusion

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation