1 Introduction

Fuzzing, also known as fuzz testing, is a powerful software testing technique that has gained significant attention in the field of software and system security testing. It involves automatically generating a large number of test cases and feeding them into the target program to detect bugs, crashes, or vulnerabilities. Today, fuzzing has emerged as a popular technique in both academia and industry. Some prominent software companies, such as Google  (Abhishek and Cris 2012; Chris et al. 2011; Max and Kostya 2016), Microsoft (Onefuzz 2020), Cisco and Adobe (Brad 2009), have developed their fuzzing tools and have successfully discovered thousands of vulnerabilities in their products. An increasing number of fuzzing studies appear at security and software engineering-related conferences and journals (Godefroid et al. 2008a; Woo et al. 2013). Designed fuzzing tools (also known as fuzzers) open sourced on GitHub and discovered many vulnerabilities in open-source software. Additionally, fuzzing has been widely employed in various renowned competitions, including the DARPA Cyber Grand Challenge (2016).

Fuzzing was proposed by Miller et al. in 1988. It was primarily employed for testing the robustness of UNIX programs (Miller et al. 1995). In 1999, it was extended to encompass security testing. During this period, blackbox fuzzing was predominantly implemented, with notable fuzzers such as PROTOS (Viide et al. 2008), SPIKE (Godefroid 2020), and Peach (Liang et al. 2018a). Blackbox fuzzing generates test cases randomly, with fast testing speed. However, it lacks access to internal program information, limiting the full exploration of deep program logic. In 2008, Godefroid et al. (2008c) developed SAGE, a whitebox fuzzer that combines symbolic execution and fuzzing techniques to generate test cases. Compared to blackbox fuzzing, whitebox fuzzing can generate test cases correlating to particular paths by exploiting program internal information. Nonetheless, software complexity and solver limitations (Avgerinos et al. 2014; Baldoni et al. 2018) present obstacles to the effectiveness of fuzzing in conducting thorough testing within a restricted time frame. Therefore, researchers have shown considerable interest in achieving a balance between the utilization of program internal information and testing efficiency. This has driven the development of greybox fuzzing. At the end of 2013, Zalewski (2013) released a greybox fuzzer American Fuzzy Lop (AFL). AFL uses instrumentation to collect path information from the target program and uses coverage to guide test case generation during fuzzing process, which has become known as coverage-based greybox fuzzing (CGF) (HonggFuzz (2015); Serebryany 2016).

Despite the successes achieved by AFL and CGF in the field of fuzzing, there are still numerous unresolved challenges. One of the primary challenges is the limited comprehension of target programs, especially for complex programs. Fully comprehending the logic and data flow of programs is an arduous task. Consequently, this lack of comprehension impedes the exploration of in-depth paths within the program, thereby restricting the improvement of code coverage (Lou and Song 2020). Another significant challenge arises from the restrictions of fuzzing in modeling specific vulnerabilities. Fuzzing randomly generates test case, bug it frequently lacks the crucial information concerning particular vulnerability features and their locations. As a result, it struggles to accurately simulate and detect certain types of vulnerabilities (Trickel et al 2023). Additionally, the deployment of fuzzing in complex applications, and their testing efficiency are important current challenges (Beaman et al. 2022; Donaldson et al. 2023).

In recent years, fuzzing has shown a trend toward integration, diversification, and open source (Google: ClusterFuzz 2019; Serebryany 2017). Current research on fuzzing mainly focuses on general fuzzing, vulnerability-oriented fuzzing, combining fuzzing with other techniques, and fuzzing for different applications. General fuzzing aims to improve the process of fuzzing to explore deep program paths and improve code coverage. For example, Skyfire (Wang et al. 2017) improves initial test case generation to increase code coverage, AFLFast (BÖhme et al. 2019) improves energy allocation to discover more paths, FairFuzz enhances mutation strategies to improve path coverage, MooFuzz (Zhao et al. 2021) improves seed schedule for better path discovery, and AFLSmart (Pham et al. 2019) focuses on the input format of the target program to generate test cases that conform to the program’s format to explore deep path. General fuzzing has better generality, but they face certain challenges in detecting specific vulnerabilities. To address this challenge, vulnerability-oriented fuzzing focuses on particular vulnerabilities and conducts relevant fuzzing research based on those vulnerability features. For example, MemLock (Wen et al. 2020) focuses on detecting uncontrollable memory consumption and uncontrollable recursive bugs. PerfFuzz (Lemieux et al. 2018) explores algorithmic complexity vulnerabilities by maximizing the edge count in the control flow graph. ConFuzz (Vinesh et al. 2020) considers the characteristics of concurrency vulnerabilities and focuses on detecting this type of vulnerability. Moreover, fuzzing is combined with other security testing techniques such as taint analysis (Bekrar et al. 2012), symbolic execution (Noller et al. 2018), machine learning (Saavedra et al. 2019), and other techniques to improve its testing performance. Fuzzing is also currently being customized for complex applications to uncover potential vulnerabilities and bugs within them.

This overview is motivated by two main points. Firstly, fuzzing has gained significant attention and undergone rapid development in recent years. It has been widely adopted across various applications and extensively utilized by numerous companies and competitions. This highlights the importance and effectiveness of fuzzing in identifying vulnerabilities and enhancing software security. Secondly, there is a lack of comprehensive surveys specifically focused on fuzzing that cover recent advancements and developments. Previous reviews (Li et al. 2018; Liang et al. 2018b; Manès et al. 2019) have provided summaries of fuzzing achievements up until 2018. Other papers (Eisele et al. 2022; Wang et al. 2020) offer systematic reviews of the historical development of fuzzing but tend to concentrate on specific types of fuzzing techniques. There is a necessity for an up-to-date and comprehensive review that encompasses the recent advancements and developments in fuzzing techniques.

This paper presents a comprehensive review of current research on fuzzing. Firstly, an overview of the basic process and classification of fuzzing is provided to offer readers a holistic understanding. The paper then proceeds to introduce CGF as a widely used and representative technique in fuzzing, establishing a solid theoretical foundation and providing technical support for subsequent research advancements. Subsequently, the latest advancements in fuzzing are categorized and discussed, exploring their applications across various domains. Finally, the paper concludes by summarizing the key findings of the reviewed research and future directions.

In this paper, we make the following main contributions.

  • We provide an overview of the processes and classifications of fuzzing, give definitions of CGF and related design details.

  • We discuss the research issues studied in fuzzing and categorize and survey the latest research work.

  • We survey fuzzing techniques in different application scenarios.

  • We summarize the challenges and future research directions of fuzzing.

The rest of the paper is organized as shown in Fig. 1. Background and related work are introduced in Sect. 2. Section 3 surveys recent fuzzing research advancements. This is followed by a review of fuzzing in applications in Sect. 4. Section 5 concludes the paper and discusses future directions.

Fig. 1
figure 1

Paper structure diagram

2 Background and related work

In this section, we first provide the inclusion criteria for the papers covered in this review, then provide an overview of the fuzz testing process, discuss the classification of fuzzing, and introduce the current classic coverage-based greybox fuzzing (CGF), and finally discuss related work.

2.1 Inclusion criteria

We reviewed more than 100 papers, mostly significant works published in top conferences and journals in the software engineering and security field from 2018 to 2023. We also included outstanding fuzzing papers published in industrial conferences, such as Blackhat. To ensure a comprehensive comprehension of the development of fuzzing techniques across various applications, we have gathered a number of classical fuzz testing papers, without any limitations on publication dates. In addition, we have collected top journals papers covering various fuzzing applications to offer a holistic perspective. To clearly define the scope, the inclusion criteria adopted are as follows.

Table 1 Security and software engineering top conference papers
  1. (1)

    We primarily selected recent proceedings from security and software engineering top conferences for the period from 2017 to 2023. The relevant papers are shown in Table 1. The former includes IEEE Symposium on Security and Privacy (S &P), Usenix Security Symposium (Usenix Security), Network and Distributed System Security (NDSS), and ACM Conference on Computer and Communications Security (CCS). The latter includes International Conference on Software Engineering (ICSE), ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), and IEEE/ACM International Conference on Automated Software Engineering (ASE).

  2. (2)

    In addition to the top conference papers related to security and software engineering, the review collects other conferences from industry and academia. The major conferences in industry are mainly Black Hat Europe. Other conferences include not only those related to software engineering and security, but also those related to areas such as machine learning and programming language design. We collect the papers related to fuzzing from that conference to summarize fuzzing applications in different fields. The relevant papers are shown in Table 2.

  3. (3)

    We also select security and software engineering-related journal papers mainly including Computer & Security, Cybersecurity, Empirical Software Engineering, IEEE Transactions on Software Engineering, and International Journal of Computer Science and Network Security. To gather applications of fuzzing in various domains, we also selected papers from artificial intelligence journals to comprehensively and systematically summarize fuzzing techniques. The relevant papers are shown in Table 3.

  4. (4)

    We collected various fuzzing papers from the arXiv platform, which have been cited by many journals and conferences. TriforceAFL (Jesse (2015)), Trinity (Jones (2010)), Syzkaller (Vyukov 2015), are open-source kernel-related fuzzers that are widely used for testing kernel. They are frequently utilized as benchmark tools in numerous kernel fuzzing-related papers. Thus, we present an overview of the primary design concepts of these tools. The relevant papers and web sources are shown in Table 4.

Table 2 Other conference papers
Table 3 Journal papers

2.2 Process of fuzzing

The goal of fuzzing is to generate different inputs and uncover as many exceptions as possible. Before fuzz testing, the input format is known and a target program is obtained. Figure 2 illustrates the general process of fuzzing. The workflow is composed of four main stages, test case generation, target program execution, exception monitoring, and Handling exceptions.

Test Case Generation Test cases can be generated using different methods, including mutation-based and generation-based methods. Mutation-based fuzzing involves taking existing valid test cases or inputs and applying random mutations to generate new test cases. Generation-based fuzzing involves generating test cases from scratch based on predefined templates, grammars, or input specifications.

Target Program Execution Once new test cases are generated, they are sent to the target program for execution. To facilitate monitoring and analysis during fuzzing process, the target program is often instrumented to collect information.

Exception Monitoring During the execution of the target program, it is continuously monitored for program behaviors. This monitoring is essential to identify if the program crashes and hangs. Various techniques can be employed for exception monitoring, such as using signals, crash analysis, and violation detection. Related tools such as Sanitizers (The home for Sanitizers 2019) and MEDS (Han et al. 2018) are commonly used to detect and locate bugs.

Handling Exceptions If the target program encounters an exception during execution, the corresponding test case that triggered the exception is saved for further analysis. These test cases are considered valuable as they can potentially reveal bugs or vulnerabilities in the program.

Table 4 arXiv papers and web sources

Exception Analysis After obtaining the test cases that triggered exceptions, testers analyze and debug target program to obtain the cause of these exceptions. Debugging tools, such as GDB (1988), IDA (2003), and OllyDbg (2000), are commonly used to assist exception analysis.

2.3 Classification of fuzzing

There are different classifications of fuzzing, as shown in Fig. 3.

Fuzzing can be divided into generation-based and mutation-based (Neystadt 2008). Inputs are generated from scratch in generation-based fuzzing (Godefroid et al. 2008b), and it is necessary to provide the expected input specification of target programs. Then, fuzzers construct inputs that violate some regulations to feed target programs according to input specifications. If there is no better input specification, fuzzing might spend more time executing error-handling code and cannot reveal bugs. A mutation-based fuzzing, new test cases are derived from existing seed mutations. Generally speaking, initial valid seeds are provided, and then the fuzzer continuously uses mutation strategies (e.g., bitflip) to generate new test cases which are provided to continuously execute the target program. Compared with generation-based fuzzers, mutation-based fuzzers are relatively simple. However, it is affected by the quality of initial seeds and may be difficult to pass program verification with complex input formats.

Fig. 2
figure 2

The workflow of fuzzing

Fig. 3
figure 3

Classification of fuzzing

Fuzzing can be divided into dumb and smart fuzzing(Neystadt 2008). The dumb fuzzing (Takanen et al. 2018; Ganesh et al. 2009) cannot understand input formats. Inputs are mainly generated using random mutations. Generally, a dumb fuzzer is faster than a smart fuzzer and has a relatively wide range of applications. For instance, AFL is a dumb mutation-based fuzzer. It uses mutation strategies to modify seed files and can fuzz pictures, audio, video, and other processing programs. A smart (model-based (Pham et al. 2016), grammar-based (Godefroid et al. 2008b), or protocol-based (Banks et al. 2006)) fuzzer leverages the input model to generate a greater proportion of valid inputs. For instance, a grammar-based fuzzer can use an abstract syntax tree to build an input model and then use node subtree mutation to transform the current subtree into a new subtree, to meet relevant requirements of grammars. However, smart fuzzers generally require users to provide an input model. The input model is specific and the structure is more complicated.

Fuzzing can be divided into whitebox, greybox, and blackbox fuzzing (Sutton et al. 2007). A whitebox fuzzer (Godefroid et al. 2008c) generally has obtained the source code of target programs. It leverages program analysis to systematically to reach critical locations of programs and increase code coverage. Symbolic execution is commonly employed in whitebox fuzzers. Since the source code of programs is available, symbolic execution is often used to constrain paths to generate test cases that meet specific constraints. Therefore, whitebox fuzzers can detect deeper bugs in the program. A blackbox fuzzer (Edwards 2001) is completely different from the white box fuzzer, it treats the program as a “black box,” and it cannot perceive any information of programs. However, it can only detect bugs on the surface of the program. A greybox fuzzer (BÖhme et al. 2019) is between whitebox and blackbox fuzzer, it uses lightweight instrumentation instead of program analysis to obtain program information that is used to guide fuzzing to improve the efficiency of fuzzing. Because of its simplicity, effectiveness, and reasonable performance, greybox fuzzers have become effective testing tools.

2.4 CGF

In recent years, fuzzing also derives many professional terms, such as seed, seed queue, and mutation strategy. These concepts run through the whole process of CGF, and the related concepts are presented in Table 5.

Based on the principle that better coverage is beneficial to detect more vulnerabilities, CGF uses coverage information to guide the fuzzer to improve coverage. CGF includes two stages: the static analysis stage and the fuzzing loop stage. In the static analysis stage, it executes compile time or dynamic binary instrumentation to obtain the instrumented target program. Algorithm 1 shows the workflow of CGF in the fuzzing loop stage. CGF uses a set of inputs provided by users as initial seeds and selects a seed to enter a continuous loop to fuzz until timeout or program terminates.

  1. (1)

    A seed is selected from a seed queue (Line 4).

  2. (2)

    The energy of seed is allocated (Line 5).

  3. (3)

    The selected seed is mutated to generate a test case using mutation strategy (Line 7).

  4. (4)

    The target program is executed with the generated test case (Line 8).

  5. (5)

    The lightweight instrumentation is used to obtain coverage information to guide fuzzers, if the test case causes a crash, it is marked and added to the crash set (Lines 9–11).

  6. (6)

    If the test case achieve new coverage, CGF adds it to the seed set (Lines 12–14).

In this section, we use a representative fuzzer AFL to introduce the relevant stages of CGF. The framework of AFL can be shown in Fig. 4.

Algorithm 1
figure a

CGF

Table 5 Related concepts of CGF

2.4.1 Instrumentation

Instrumentation is a common technique of inserting code fragments to trace the related information of programs without breaking the program logic. There are two ways of instrumentation, source code instrumentation and dynamic binary instrumentation. The former is inserting the assembly code in source code during the compilation process by static analysis or writing Clang (2007) manually. These correspond to the “\(afl-gcc\)” and “\(afl-clang\)” instrumentation ways in AFL, respectively. The latter is mainly implemented using binary dynamic instrumentation frameworks, such as Pin (Luk et al. 2005), Dynamorio (2015), PaiMei (2016), and Frida (2016).

Fig. 4
figure 4

The framework of AFL

Fig. 5
figure 5

Coverage calculation

Coverage-based greybox fuzzing techniques commonly employ instrumentation to collect coverage information, including edge and basic block coverage. For instance, HonggFuzz (2015) utilizes basic block coverage as feedback information, while AFL adopts edge (branch) coverage for feedback. Before the fuzzing loop stage, it uses instrumentation to insert code fragments. AFL preserves a 64 KB shared memory Bitmap to record edge coverage information. AFL assigns a random number to each basic block in the program at compile time to indicate a unique identifier for the current basic block and uses XOR and right-shift operations on the current basic block and the previous basic block to mark each edge. Each edge is used as an offset of Bitmap and the value is the count of hits.

The specific formula for coverage calculation is as shown in Fig. 5.

2.4.2 Seed selection and power schedule

Seed selection refers to select seeds from the seed pool for future mutation. A perfect seed selection scheme is conducive to speeding up path discovery and bug detection. AFL gives priority to seeds that are unfuzzed and favored.

Power schedule aims at allocating energy to each seed during the fuzzing process, which determines the number of test cases generated by a seed after mutation. Reasonable energy allocation can effectively improve the discovery of new paths. If the energy of a seed is over-allocated, other seed’s mutation will be affected. Conversely, if the energy of one seed is under-allocated, it will be detrimental to path discovery and potential bug detection. AFL has two energy allocations based on different mutation stages.

In the deterministic stage, energy is related to seed length. The longer the seed size, the more energy will be consumed.

In the non-deterministic stage, energy allocation depends on the running time, the number of edges, the average size of seed files, the number of cycles, and others.

2.4.3 Mutation strategy

The mutation strategy determines how to mutate and which part of the seed is mutated. Mutation strategies in AFL consist of two stages: deterministic stage and non-deterministic stage. The former includes bitflip, arithmetic, interest, and dictionary. It is applied to seeds that are selected to mutate for the first time. The latter includes havoc and splice.

The bitflip uses different flip lengths and steps. It includes bitflip 1/1, bitflip 2/1, bitflip 4/1, bitflip 8/8, bitflip 16/8, and bitflip 32/8. In the bitflip, AFL has some heuristic judgments on the file format, such as automatic detection of token and generation of effector map.

The arithmetic performs integer addition and subtraction mutations. It includes arith 8/8, arith 16/8, and arith 32/8, which mean to perform addition and subtraction operators on 8, 16, and 32 bits each time, respectively, starting from the beginning of the seed file according to the step length of each 8 bits. The big endian and little endian are also considered in arithmetic.

The interest performs substitution using pre-define interesting values. It consists of interest 8/8, interest 16/8, and interest 32/8 in steps of 8 bits, which means that starting from the beginning of the seed file, the interesting values of 8, 16, and 32 bits are replaced one at a time, depending on each 8-bit step, respectively.

The dictionary performs substitution using user-provided tokens including user extras (over), user extras (insert), and auto extras (over). The user extras (over) and user extras (insert) indicate to replace and insert into the seed file with the tokens provided by the user, respectively. The auto extras (over) use automatically generated tokens in bitflip to replace seed file.

The non-deterministic stage, in which the havoc randomly selects a random position of seed files to mutate according to mutation strategies in deterministic stage, and the splice splits each of the two seed files in two and splices the head and tail.

2.5 Related work

There are papers that systematically introduce the previous advances in fuzzing, as shown in Table 6. Li et al. (2018) conducted a comparative study that compared fuzzing with other vulnerability discovery techniques and provided an overview of the research achievements in 2018. Liang et al. (2018b) categorized and reviewed relevant papers on fuzzing from 1990 to 2017, based on different classifications and applications. Manès et al. (2019) proposed a unified and general fuzzing model to address the existing inconsistencies in the concepts related to fuzzing and provide a standardized framework for understanding and discussing fuzzing techniques. Beaman et al. (2022) examined the latest developments in fuzzing for vulnerability discovery, proposed a method for classifying fuzzers, and discussed key research challenges and potential future research directions.

Other review papers primarily focused on summarizing fuzzing in specific application domains or providing an overview of a specific type of fuzzing. Wang et al. (2020) provided the first in-depth study of directed greybox fuzzing (DGF). They performed an extensive review of 42 state-of-the-art fuzzers, meticulously categorizing the recent advancements while also providing a comprehensive overview of the associated challenges. Eisele et al. (2022) reviewed embedded fuzzing and proposed a formal definition of embedded fuzzing, and carved out the additional challenges of embedded fuzzing compared to traditional fuzzing. Saavedra et al. (2019) and Wang et al. (2020) reviewed the application of machine learning in fuzzing.

Table 6 A summary of previous surveys on fuzzing
Table 7 General fuzzing

3 State-of-the-art fuzzing

In this section, we first summarize the research questions (RQs) of fuzzing according to the fuzzing process and then answer the following questions by using state-of-the-art fuzzing.

  1. RQ1

    How to get initial seeds?

  2. RQ2

    How to select seeds?

  3. RQ3

    How to assign energy for seeds?

  4. RQ4

    How to mutate seeds and select mutation strategies?

  5. RQ5

    How to quickly detect specific bugs?

  6. RQ6

    How to integrate other techniques to improve the performance of fuzzers?

3.1 General fuzzing

General fuzzing mainly optimizes and improves the greybox fuzzing process to improve code coverage and find more program exceptions. Its main research challenges include how to select the initial seed, how to select the seed from the seed queue, how to assign energy to the seed and what mutation strategy to adopt, based on the above research questions, the related works are shown in Table 7.

3.1.1 Initial seed selection (RQ1)

The initial seeds are user-supplied test cases prior to fuzzers, which is extremely important for mutation-based fuzzing. Because the test cases are generated by mutating the existing seeds. Ideally, a high-quality seed requires meeting the following conditions: (1) There is a good format that matches target applications. (2) The initial seed size should be as small as possible under the condition of meeting the rules so that the efficiency of running is high. (3) The initial seeds can generate test cases with deep bugs after mutation. (4) Good seeds can be used many times in constant fuzzing.

Table 8 Different fuzzers with seed selection optimization

Based on the above conditions, initial seeds are generally based on the following selection criteria: (1) Some well-formatted data sets Fuzzdata (2015). (2) The proof of concept (POC) CVE (2016) of target applications. Well-formatted data sets are generally available. Some target applications provide some test cases to aid fuzzing. Specific types of applications, such as image, audio, and video processing programs (Ju et al. 2021), have related image, video, and audio libraries that can be selected as the initial seed set. Users can also use online crawlers and other techniques to crawl the required data as initial seeds after filtering, trimming, and other processing. Generally, a POC that triggers vulnerabilities in the older version of applications is a good seed, because the vulnerability is relatively risky, even if it is repaired, there may be risks. Nowadays, many testers have published some POCs of target applications on GitHub and other platforms that provide good initial seed sets. Researchers have also studied initial seed generation, and Wang et al. (2017) proposed Skyfire in the 2017 S &P conference, the main idea is to use a large number of existing samples to learn probabilistic context-sensitive grammars and generate highly structured test cases as the initial seeds of AFL, perform fuzzing and detect many vulnerabilities.

3.1.2 Seed selection optimization (RQ2)

Fuzzing generates high-quality test cases such as new coverage or significant features called seeds and saves them in the seed queue, how to choose which seed to use in the next round from the seed queue seriously affects fuzzing efficiency. The research related to seed selection optimization is presented in Table 8.

Table 9 Various fuzzers with power schedule

Seed selection optimization can drive fuzzing to evolve in different directions. VUzzer (Rawat et al. 2017) uses static analysis to extract control flow graphs, assigns weights to basic blocks in the control flow graphs, computes the weights of the seeds to quantify the depth of their execution paths, and prioritized the seeds with deeper execution paths. Gan et al. (2018) developed CollAFL, which uses lightweight instrumentation to solve the hash collision problem, and proposed untouched-neighbor-branch guided policy, untouched-neighbor-descendant guided policy, and memory-access guided policy three seed selection strategies to improve code coverage and find more bugs. Cerebro (Li et al. 2019) measures code complexity and selects seeds by combining various attributes such as the code complexity, the coverage, and the execution time. Wang et al. (2020) proposed a fuzzer called Tortoisefuzz. Based on the observation that not all coverage measurements are equal, Tortoisefuzz is proposed to discover memory corruption from function calls, basic blocks, and loops. The instrumentation is modified to optimize the input of dangerous functions, memory read and write in basic blocks, and loops to obtain sensitive edge information. The obtained information is used to guide seed optimization to find more vulnerabilities. K-Scheduler (She et al. 2022) models the seed selection problem for fuzzing as a graph centrality analysis problem, constructing an edge horizon graph using a control flow graph and using the Katz centrality to compute the centrality score to approximate the number of uncontrolled flow graph edges that are reachable and feasible for the seed from a starting point, and preferentially scheduling the uncontrolled but potentially reachable CFG edges with seeds that have more. MooFuzz (Zhao et al. 2021) integrates vulnerability and improved coverage perspectives, divided the seed pool into exploration, search, and evaluation three states and employed different many-objective optimization schemes for seed selection for the different states.

Deep learning, machine learning, and other techniques have also been used to aid fuzzing for seed selection optimization. NeuFuzz (Wang et al. 2019) uses neural network models to predict whether a path is vulnerable or not, prioritize vulnerable path-related seeds are prioritized and seeds are selected. MEUZZ (Chen et al. 2020a) uses the use of supervised machine learning to select seeds based on past knowledge learn from past seed scheduling decisions on the same or similar procedures, to select seeds that are expected to yield a greater yield to improve the efficiency of hybrid fuzzing. Wang et al. (2021) proposed a multi-level coverage measurement approach, which models the fuzzing process as a multi-armed bandit model. They utilized the upper confidence bound (UCB1) algorithm to score the seeds based on the rarity of the seeds and the difficulty of generating new coverage paths through seed mutations on a hierarchical tree. The seeds with higher scores are selected.

Directed greybox fuzzing is designed to perform tests on pre-selected or potentially vulnerable target locations, which are mainly used in various scenarios such as vulnerability recovery and patch testing. Therefore, seeds are also often selected based on target locations in directed fuzzing. AFLGO (Böhme et al. 2017) is a directed fuzzer that dynamically calculates the distance between a seed and a user-given target location, and prioritizes seeds that are closer to the target location. Hawkeye (Chen et al. 2018) addresses the limitations of the fuzzer AFLGO by defining more accurate distances, preferring seeds that cover new branches and have greater similarity to the objective function and target point. TOFU (Wang et al. 2020) is a target-oriented directed fuzzer that performs structured mutations using knowledge of the input structure provided by the user, based on which the distances of the basic blocks are calculated and seeds that are more likely to reach the target location are selected.

3.1.3 Power schedule (RQ3)

Power schedule is to allocate energy to seeds. Energy determines the number of test cases generated by the seed mutation. Proper allocation of energy to seeds not only gives other seed mutation opportunities, it also improves the testing efficiency. The research related to power schedule is presented in Table 9. AFLFast (BÖhme et al. 2019) models the fuzzing through Markov chain and uses transfer probability to represent the probability of seed mutation to generate other test paths, gives the concept of high-frequency and low-frequency paths, and preferentially allocates more energy to the seed executing low-frequency paths to improve the efficiency of fuzzing. EcoFuzz (Yue et al. 2020) selects seeds according to the divided states of the seed pool and models the seed energy allocation problem as a multi-armed bandit problem, using reinforcement learning to allocate energy to seeds. RegionFuzz (Situ et al. 2021) adopts code metrics to evaluate vulnerable regions in the code and allocates more energy to the seeds that reach the region to improve vulnerability detection. OTA (Li et al. 2021) transforms the energy allocation of the deterministic stage of AFL into a mutation time allocation problem by using a particle swarm optimization (PSO) algorithm to optimize the mutation time. MobFuzz (Zhang et al. 2022) selects execution speed, memory consumption, and deep nested branches as optimization objectives, models fuzzing as a multi-armed bandit model, and allocates reasonable energy to seeds based on different combinations of objectives. AFL++hier (Wang et al. 2021) uses a multi-armed bandit model to allocate energy among different clusters of seeds with multi-level coverage metrics. AFLGo (Böhme et al. 2017) uses the simulated annealing (SA) algorithm to allocate more energy to seeds that are closer to the user’s given target. SLIME (Lyu et al. 2022) uses an upper confidence bounds-variance aware algorithm to adaptively allocate energy based on the path and crash discovery capabilities of attribute queues by estimating the potential reward of each attribute queue.

3.1.4 Mutation strategy (RQ4)

The mutation strategy determines where the seeds are mutated, how they are mutated, and how the mutation operator is chosen. The research related to mutation strategy is presented in Table 10. FairFuzz (Lemieux and Sen 2018) uses new mutation strategies to address the limitation of low coverage of AFL. It identifies branch paths that are less frequently executed as rare branches and a mutation mask algorithm that allows mutations is proposed to reach rare branches, improving code coverage. AFLTurbo (Sun et al. 2020) employs interruptible mutation to determine mutation time, locality-based mutation to reduced mutation regions, and hotspot-aware fuzzing to identify metadata to reduce mutation overhead and improve code coverage. ProFuzzer (You et al. 2019) performs byte-level mutation and observes the results of fuzz testing to delineate input fields and infer field types, guiding seed mutations based on different field types to improve path discovery and vulnerability detection.

Table 10 Various fuzzers with mutation strategy

To generate test cases that conform to the desired format, multiple heuristic mutation strategies have been proposed to generate valid inputs. Superion (Wang et al. 2019) adds two mutation strategies to AFL to generate structured inputs, including enhanced dictionary-based mutation and tree-based mutation. Enhanced dictionary-based mutation locates the boundaries of tokens by checking whether the bytes of test cases are consecutive alphabet or digit and then inserting tokens in dictionary to every boundary. Tree-based mutation replaces the abstract syntax tree of the parsed input with a subtree. These strategies improve the effectiveness of test case generation while reducing the number of mutations. AFLSmart (Pham et al. 2019) uses a high-level virtual structure to represent seed files and divided them into chunks, then executing chunk-level smart deletion, addition, and splicing operators based on virtual input structure, to generate new valid inputs. GRIMOIRE (Blazytko et al. 2019) implements a syntax inference that is used to automatically fuzz highly structured formats during the fuzzing process without human interaction or program modification. It modifies inputs by removing element parts of the input and marks the element parts that are the same as the original input in achieving coverage, to learn the structure of inputs, and performs large-scale mutations on learned structures to generate structural inputs. SLF (You et al. 2019) performs a dependency analysis to generate valid seed inputs. Based on AFL, it flips each byte of random inputs to generate new test cases to execute the program. The mutated consecutive bytes affect the same comparison in constantly mutating and are marked as identical fields of inputs. Based on the information of input fields, the corresponding input checks are classified, including arithmetic, index/offset, count, and ITE (\(If-Then-Else\)) checks. A multi-goal search algorithm is used to mutate inputs to satisfy inter-dependent checks to generate valid seeds.

Sanity checks often appear in programs, such as checks on magic bytes, checksums, hashes, and others. Magic bytes are bytes that are used for comparison instructions in the inputs, such as string equality comparison. It is difficult for fuzzing to pass magic byte comparison. To pass the sanity check, corresponding mutation strategies have been proposed. Steelix (Li et al. 2017) is used to generate test cases to pass magic bytes. It collects coverage information and performs lightweight static analysis and binary instrumentation to finish comparisons. For certain multi-byte magic numbers, Steelix can accurately detect single-byte matching. By fixing its corresponding generated correct byte position, Steelix traverses the front or back bytes exhaustively to generate the corresponding magic byte, thereby reducing the mutation space. REDQUEEN (Aschermann et al. 2019) uses a lightweight tracking-based technique to generate test cases that can pass the magic byte and checksum. Based on a simple assumption that part of the inputs is directly transferred to memory or register for comparison at runtime, a strong input-state correlation exists between input and current state. REDQUEEN first tracks and recognizes the comparison instructions during the program execution, and then determines which part of the input affects the change of memory and register, and finally mutates to generate new coverage. T-Fuzz (Peng et al. 2018) uses deletion of the sanity check in target programs to ensure that fuzzing execution when fuzzing has no new coverage. The technique based on symbolic execution is used to filter false positives and reproduce the real bug in the original program.

Many fuzzers incorporate taint analysis techniques, which gather data flow information and infer which bytes in the input can influence program execution. This guidance is used to guide seed mutations, resulting in improved test case generation and increased code coverage. Angora (Chen and Chen 2018) uses byte-level taint tracking, context-sensitive branch count, search based on gradient descent, and input length exploration instead of symbolic execution to generate high-quality input to solve path constraints and increase branch coverage. Matryoshka (Chen et al. 2019a) identifies the control flow dependencies and taint flow dependencies between conditional statements and then uses three strategies, including prioritizing reachability, prioritizing satisfiability and jointing optimization for both reachability and satisfiability to solve path constraints that involve deeply nested conditional statements. Greyone (Gan et al. 2020) uses a lightweight taint inference to guide fuzzing. During the fuzzing process, the taint of variables in the program is inferred by changing the bytes of inputs. A taint-guided mutation is used to determine which bytes to mutate, which branches to explore, where to mutate, and how to mutate, and then using constraint conformance calculation to guide seed selection to explore hard-to-reach branches. PATA (Liang et al. 2022) employs dynamic taint analysis techniques to identify the dependency between input and paths. It mutates the input bytes that affect the dependency relationship to resolve constraints.

A smart mutation strategy schedule can be highly fuzzing efficient. Based on different mutation strategies that have different effectiveness for different programs, MOPT (Lyu et al. 2019) uses a particle swarm optimization algorithm to perform mutation strategy schedule. It treats each mutation operator as a particle and iteratively updates the probability of each particle through the local optimum and the global optimum. Then, MOPT integrates the updated probabilities of all particles to obtain a new probability distribution. CMFuzz (Wang et al. 2021) uses a context-aware mutation method, which dynamically extracts byte stream features for the seed file during the fuzzing process, and then uses a multi-armed bandit algorithm to select the optimal mutation operation. AMSFuzz (Zhao et al. 2022) models the mutation operator selection problem in fuzzing as a dodging slot machine model that adaptively selects mutation operators to improve the efficiency of path discovery and vulnerability detection.

3.1.5 Summary of general fuzzing

The main purpose of general fuzzing is to optimize the process of greybox fuzzing to improve the quality of test cases, increase code coverage, and detect more vulnerabilities. The current research status can be summarized into four aspects: initial seed selection, seed selection optimization, energy allocation, and mutation strategies. Initial seed selection involves selecting the initial seed inputs for mutation-based fuzzing. This is achieved through methods such as crawling, historical POCs, and grammar-based learning. Seed selection optimization is a crucial aspect of general fuzzing. Current research utilizes techniques such as static analysis, intelligent optimization, and machine learning to evaluate seeds based on criteria such as coverage improvement, vulnerability triggering potential, and reaching target locations. This evaluation helps determine the priority of seed selection, thus improving testing efficiency. Energy allocation refers to the quantity of generated test cases through seed mutation. Current research employs machine learning, optimization algorithms, and reinforcement learning techniques to allocate reasonable energy based on seed evaluation results. Mutation strategies involve specific methods for mutating seeds. Current research utilizes techniques such as taint analysis, heuristic strategies, and mutation optimization to determine mutation positions, methods, and how to optimize mutation strategies to generate well-formed test cases, thereby improving the quality and efficiency of mutations.

With the increasing complexity of software, general fuzzing research demonstrates good generality and is continuously evolving and optimizing. However, it still faces challenges in detecting specific vulnerabilities.

3.2 Vulnerability-oriented fuzzing (RQ5)

General fuzzing has shown good generality, but it lacks an advantage in discovering specific vulnerabilities. Therefore, vulnerability-oriented fuzzing aims to uncover specific types of vulnerabilities. Its core idea is to analyze the behaviors related to vulnerabilities, guide fuzzing to satisfy the corresponding behaviors, thereby enabling faster discovery of vulnerabilities of that type. Based on different types of vulnerabilities or bugs, relevant research is presented in Table 11.

Table 11 Fuzzers that detect various vulnerabilities

3.2.1 Uncontrollable memory consumption & uncontrolled recursive

MemLock (Wen et al. 2020) is utilized for the detection of uncontrollable memory consumption and uncontrolled recursive bugs. By employing instrumentation techniques within the fuzzing process, MemLock continuously gathers information on memory consumption and recursive function calls. Test cases that achieve new coverage and cause increased memory consumption or a higher number of recursive calls are selectively chosen by MemLock. Selected test cases are then added to a seed pool and assigned a higher priority, aiming to effectively trigger vulnerabilities.

3.2.2 Integer overflow and array overflow

TIFF (Jain et al. 2018) employs dynamic taint analysis to identify data types associated with input offsets during the program execution. It performs not only coverage-oriented mutation but also bug-oriented mutation, combining input types and bug trigger conditions, using a custom mutation strategy to generate test cases to detect two types of memory corruption (integer overflow and array overflow).

3.2.3 Memory vulnerability

ovAFLow (Zhang et al. 2022) utilizes static analysis to identify potential locations in the program that may lead to memory vulnerabilities. Specifically, it focuses on memory manipulation function parameters and memory loop variables. By employing taint inference, ovAFLow establishes the corresponding relationships between input bytes and these identified locations. It then guides the fuzzing process by mutating seeds to trigger memory vulnerabilities within the program.

3.2.4 Consistency error

COMFORT (Ye et al. 2021) is used to detect consistency vulnerabilities in JavaScript engines. It uses a DNN-based deep learning language model, GPT-2, to generate random JS programs, and then extracts the boundary conditions of the test programs related to JS API from the ECMA-262 specification to guide the generation of multiple test case programs. The generated test case programs are sent to multiple JS engines to detect consistency issues through inconsistency analysis.

3.2.5 Use-after-free

The triggering of use-after-free (UAF) vulnerabilities requires the sequential execution of three specific operations: memory allocation, free, and memory use. Currently, fuzzing techniques face difficulties in detecting UAF vulnerabilities. To address this problem, UAFL (Wang et al. 2020) is proposed to detect UAF. It identifies operation sequences by static typestate analysis, and not only collects the control flow information and also collects operation sequence information to guide generate seeds that cover operation sequences, to detect UAF. UAFuzz (Nguyen et al. 2020) employs bug trace flattening to extract serialized basic blocks and functions related to UAF vulnerabilities from binary programs. It prioritizes seeds that are similar to these blocks and functions and allocates more energy to them to detect UAF vulnerabilities. MDFuzz (Zhang et al. 2021) uses static analysis to identify the locations of memory allocation, deallocation, and access as targets. It calculates the distance from each basic block to the targets and selects seeds based on this distance, thereby enhancing the triggering of UAF vulnerabilities.

3.2.6 Algorithmic complexity vulnerability

Algorithmic complexity vulnerabilities occur when the worst-case time or space complexity of an application is significantly higher than the average case for specific user-controlled inputs. To detect algorithmic vulnerabilities, PerfFuzz (Lemieux et al. 2018) first extracts control flow graphs to collect information on each edge of the target program by improving instrumentation. Then, PerfFuzz selects test cases that maximize execution counts of each edge in the control flow graph (CFG) as the pathological input. SlowFuzz (Petsios et al. 2017) is based on libFuzzer (Serebryany 2016) and prioritizes inputs that increase total path length to detect algorithmic vulnerabilities.

3.2.7 Concurrency vulnerability

To detect concurrency vulnerability, Liu et al. (2018) proposed a heuristic framework that uses static analysis to find sensitive concurrent operations and determines the order of execution of sensitive operations. Each specific execution sequence is explored to trigger a potential concurrency vulnerability. ConFuzz (Vinesh et al. 2020) uses static analysis to traverse the paths in the program. Each basic block in the path is instrumented and then assigned an id and a weight based on the distance of basic blocks to thread functions. During the fuzzing process, the information related to basic blocks is used to generate new test cases, cover more basic blocks, and detect concurrency vulnerabilities with the aid of thread sanitizer ThreadSanitizer (2019). A greybox fuzzing framework for multi-threaded programs, MUZZ (Chen et al. 2020) used coverage instrumentation, thread context instrumentation, and thread scheduling instrumentation to explore the execution state of multi-threaded programs and then prioritizes those seeds that have explored new code coverage or thread contexts to detect concurrency vulnerabilities.

Table 12 Fuzzers that integrate different techniques

3.2.8 Side channel attack

Side channel attack exploits information leakage observed during the execution of certain operations to undermine security measures. To detect side channel vulnerabilities, DifFuzz (Nilizadeh et al. 2019) employs a resource-guided heuristic algorithm to test two different versions of an application. During the fuzzing process, DifFuzz generates test cases that maximize differences in resource consumption between the versions, such as time, memory, and response size, to detect side channel attacks. Brennan et al. (2020) proposed a fuzzing technique for detecting timing side channel vulnerabilities in Java programs caused by just-in-time compilation.

3.2.9 Summary of vulnerability-oriented fuzzing

Vulnerability-oriented fuzzing refers to fuzz testing based on the known features and triggering patterns of specific vulnerabilities to improve the efficiency of detection. Based on the above research, vulnerability-oriented fuzzing firstly requires to model known vulnerabilities and extract the vulnerability information to guide test cases generation to detect the type of vulnerability. Common methods include static analysis, dynamic analysis, deep learning, and other techniques to analyze and extract semantic information, contextual information, and potential locations of vulnerabilities. Based on the obtained information, techniques such as instrumentation and taint analysis are employed to improve and optimize seed selection, energy allocation, mutation strategies, and guide fuzzing in generating test cases that better uncover the targeted vulnerabilities. Therefore, vulnerability-oriented fuzzing is of significant importance in detecting specific types of vulnerabilities.

3.3 Combining fuzzing with other techniques (RQ6)

To enhance the efficiency of fuzzing, various techniques such as symbolic execution, taint analysis, parallel, and instrumentation are commonly integrated. Table 12 provides an overview of related work that incorporates these techniques into fuzzing.

3.3.1 Symbolic execution

Hybrid fuzzing adds symbolic execution based on traditional fuzzing. It is one of the current popular fuzzing branches. Driller (Stephens et al. 2016) consists of a greybox fuzzer AFL and a symbolic execution engine Angr (Wang and Shoshitaishvili 2017). AFL can quickly generate test cases to fuzz target programs. Angr can solve the constraints in the program and generate new test cases only when AFL does not find new paths. Driller combines the advantages of AFL and the dynamic symbolic execution Angr, it avoids the difficulty of AFL to break through special boundaries and the problem of dynamic symbolic execution path explosion. QSYM (Yun et al. 2018) is a concolic testing engine tailored for hybrid fuzzing. To improve performance, it does not translate the target program into an intermediate representation but uses a dynamic binary instrument framework, Inter Pin (Luk et al. 2005), to add symbolic tracing to the target program and employs pruning basic blocks and eliminating extraneous constraints to increase the speed of hybrid fuzzing. However, it only supports x86 system architectures.

Most fuzzers are mainly coverage guided that waste a lot of time on codes without bugs and it is difficult to reach protected codes that have complex conditions. A hybrid fuzzing framework, SAVIOR (Chen et al. 2020b), is designed based on bug-driven principles. SAVIOR uses the UBSan UndefinedBehaviorSanitizer (2019) to label potential bugs for the target program and uses static analysis to find the protected code region. During the fuzzing process, it optimizes seed selection according to unexplored branches, bug labels, and difficulty degree in branch exploration. Unlike other hybrid fuzzers, SAVIOR also adds bug-guided verification to verify all possible vulnerabilities in the execution path to ensure that no vulnerability is missed. DigFuzz (Zhao et al. 2019) is a probabilistic hybrid fuzzer, which is used to address the problems of schedule between concolic execution and fuzzing. It uses Monte Carlo path optimization to quantify the difficulty of path and assigns those paths for concolic execution. Since the overhead of hybrid fuzzing is huge, PANGOLIN (Huang et al. 2020) used polyhedral path abstraction to reuse the values solved by the constraint solver based on traditional hybrid fuzzing, which can improve the efficiency of constraint solving while using the results of reusing constraint solving to guide mutation in fuzzing. QuickFuzz (Lin et al. 2021) quantifies two factors of path solution demand and solution cost, and adopted a priority-based path searching method to select the missing path to execute the mixed execution, to improve the performance of hybrid fuzzing.

3.3.2 Parallel and integration

Parallel and integration are also often used in fuzzing to improve the efficiency of fuzzing. PAFL (Liang et al. 2018) extends the existing single-mode fuzzing optimization to an industrial parallel mode using efficient guidance information synchronization and task partitioning, allowing multiple fuzzers to work in parallel. EnFuzz (Chen et al. 2019b) implements seed synchronization to improve the effectiveness of different fuzzers, integrating multiple fuzzing strategies to improve the performance and versatility of fuzzers. Cupid (Güler et al. 2020) collects and applies empirical data from a single isolated fuzzer and automatically identifies and selects a set of fuzzers that complement each other during collaborative execution for parallelized and distributed fuzzing. AFL++ (Fioraldi et al. 2020) integrates recent research results based on AFL, adding customized mutation APIs, achieving better fuzzing speed, and supporting customized modules.

3.3.3 Instrumentation

To reduce the overhead caused by coverage tracking, UnTracer (Nagy and Hicks 2019) adds interrupt instructions before each uncovered basic block of target programs to construct a new program. Each generated test case is sent to the new program to execute, and if an interrupt is triggered, the coverage tracking will be performed subsequently, otherwise, the test case is discarded. After coverage tracing, the previous basic block that has not been reached before will be determined, and the triggered interrupt instruction is deleted to avoid invalid tracing.

Reducing instrumentation overhead can improve the performance of fuzzing. Zeror (Zhou et al. 2020) uses two mechanisms to improve the performance of fuzzing, the self-modifying tracing mechanism and the binary switch scheduling mechanism. The former is used to maintain a set of unvisited instrumented points in the fuzzing process, and once an instrumented point is visited, the point would be removed, reducing the overhead caused by instrumentation. The latter provides a multi-granularity binary switch scheduling using a Bayesian approach to switch the different coverage granularity instrumentation to better detect vulnerability.

3.3.4 Other techniques

In the research of general fuzzing and vulnerability-oriented fuzzing, fuzzing can also be combined with taint analysis, deep learning, machine learning, intelligent optimization, and other techniques to enhance the efficiency of fuzzing. The techniques has been discussed in Sects. 3.1 and 3.2.

3.3.5 Summary of fuzzing integration with other techniques

The integration of fuzzing with other techniques is one of the important directions in current research on fuzzing. Related techniques include symbolic execution, taint analysis, parallel and integration techniques, static analysis, intelligent optimization, deep learning and machine learning. Symbolic execution can obtain the input corresponding to a specific path by analyzing the program, but faces aces difficulties such as path explosion and inefficient solving. On the contrary, fuzzing can generate a large number of test cases to quickly cover more program paths, but it is difficult to ensure the efficiency of testing for specific paths. Existing research combines symbolic execution with fuzzing to take advantage of the strengths of each technique and improve overall testing efficiency. Taint analysis can trace sensitive data flows from the sources to sink, determine which inputs can influence the program’s execution. Existing research combines taint analysis and fuzzing to determine mutation locations of seeds to generate high-quality test cases and improve testing efficiency. Parallel and integration are incorporated into fuzzing can save testing and enhance fuzzing efficiency. Other techniques, such as intelligent optimization, deep learning, and machine learning, are also combined with fuzzing to assist in program analysis, power schedule, test case generation, ultimately improving the efficiency of vulnerability discovery.

4 Fuzzing: different applications

In this section, we discuss the research progress of fuzzing in diverse applications, including SMT solver, virtual machine monitor, kernel, smart contract, protocol, and machine learning model. Table 13 shows the fuzzing of different applications.

Table 13 Fuzzing of different applications

4.1 SMT solver

Satisfiability modulo theories (SMT) solvers such as CVC4 (2021), Z3 (2015) are core and complex components of program analysis, which have widely been used in many applications, such as formal verification, security analysis, automated theorem proving, and symbolic execution. SMT solvers are used to evaluate the satisfiability of SMT instances. It is a complex system combining multiple decision procedures for various theories, such as uninterpreted functions, linear/nonlinear arithmetic, bit vectors, arrays strings, and others. It is difficult to find issues in the solvers and many works start to fuzz SMT solvers by generating SMT instances continuously. Fuzzers for SMT solver are shown in Table 14.

Stringfuzzer (Blotsky et al. 2018) is an open-source automated test string SMT solver fuzzer. It can generate and transform SMT instances with string or regular expression constraints, but the satisfiability of the instances generated by Stringfuzzer is unknown. A new string solver fuzzer, StringSolversTests (Bugariu and Müller 2020) is proposed which can construct satisfiable or unsatisfiable SMT instances with known satisfiability truth. These instances are used as inputs to fuzz SMT string solvers, to discover soundness and performance, completeness, and other bugs. YinYang (Winterer et al. 2020) is a mutation-based fuzzer that mutates a set of seed formulas, and then uses the mutated formulas as inputs to fuzz SMT solvers. The tool can detect soundness bugs, crashes, invalid model bugs, segmentation faults, and other bugs. STOROM (Mansur et al. 2020) is a blackbox mutation fuzzer. It first uses seed fragments to decompose formulas in seed instances into subformulas, then recombines these subformulas to generate new formulas, and finally the generated formulas are used to create a new, satisfying SMT instances to detect critical bugs in any SMT solver. A lightweight opfuzzer (Winterer et al. 2020) is proposed which uses type-aware operator mutation to generate test cases that meet the requirements and verify the results through different tests. BanditFuzz (Scott et al. 2020) uses reinforcement learning to automatically isolate and arrange grammatical structures in the input to explore the cause of errors or performance problems in the floating-point and string solvers.

4.2 Virtual machine monitor

Virtual machine monitor (VMM), also known as a hypervisor, is a core component of cloud computing and is used to virtual CPU, memory, I/O, and devices. Due to the diverse interfaces and complex architecture of virtual machines, as well as different interaction states, it makes direct fuzzing is not effective by using traditional fuzzing. Fuzzers for VMM are shown in Table 15.

Virtual devices are stateful and only work properly when properly initialized. They have specific behaviors during normal running. To fuzz virtual device, VDF (Henderson et al. 2017) uses the record and replay methods to detect virtual device bugs by first collecting the initialized and normal running behaviors of the device, then mutating and replaying the normal running behaviors. Jack and Li (2016) designed a framework for fuzzing virtual devices. It uses AFL to obtain coverage information and customizes lightweight customized BIOS to achieve portable and efficient fuzzing testing. Schumilo et al. (2020) presented a fuzzer for hypervisors, HYPER-CUBE, which can be applied to both open and closed source hypervisors. Based on a custom operating system, a custom bytecode interpreter is deployed which does not use coverage-guided fuzzing but is able to achieve high throughput, efficiency, and code coverage. NYX (Schumilo et al. 2021) is a coverage guidance hypervisor fuzzer. It uses instrumentation tool Inter Pin (Luk et al. 2005) to obtain coverage information and leverages a fast snapshot reload mechanism to obtain the virtual machine state generated by the previous test cases. To generate better test cases, a mutation engine based on bytecode programs is implemented that users can define a specification to generate inputs for different interfaces, and the concept of affine types is proposed to narrow down the space for test case generation.

Table 14 Fuzzers for testing SMT solver

4.3 Kernel

Kernel is an important and complex piece of system software for computers. Kernel security is usually a hot topic in some community forums. Many kernel fuzzers have been developed to test the kernel subsystem, such as file system, memory management, and device drivers and used to solve special problems and specific vulnerabilities in kernel. Fuzzers for kernel are shown in Table 16.

Trinity (Jones 2012) is a Linux system call fuzzer that uses the type information parameters provided in the Linux system call prototype definitions to passed parameter to the system calls drives the generation of test cases. However, Trinity does not keep track of coverage. Dmitry et al. Vyukov (2015) took this problem into account and developed an unsupervised coverage-guided kernel fuzzer, Syzkaller. Syzkaller uses the gcc port of the address sanitizer to keep track of coverage and supports other OS kernels as well. TriforceAFL Jesse (2015) used QEMU to implement code coverage and added serialization techniques to kernel APIs, and extended AFL to support fuzzing kernels. IMF (Han and Cha 2017) is a model-based API fuzzer that takes API calls context into account and generates random but well-structured seeds by inferring dependencies between individual APIs and combining with a mutation engine. KAFL (Schumilo et al. 2017) is a coverage-guided kernel fuzzing tool that utilizes one new feature in Intel CPU-provided hardware called Intel processor trace (PT). This hardware feature allows the CPU to gather branch tracing information to maximize code coverage within a limited time. Moonshine (Pailoor et al. 2018) implemented a seed distillation algorithm that uses static analysis to identify dependencies between function calls and then grouping them to generate seeds that ensuring code coverage.

Table 15 Fuzzers for testing VMM
Table 16 Fuzzers for testing kernel

The device drivers provide software interfaces to access hardware. Fuzzing device drive is concerned by researchers. Corina et al. (2017) proposed DIFUZE, an interface-aware fuzzing tool focusing on the ioctls interface provided by device drivers, which utilizes static analysis techniques to generate legal input sequences and track the execution of the driver. Song et al. (2019) presented PeriScope, a probing framework, suitable for detecting vulnerabilities that can be exploited by peripheral devices which lack underlying safeguards, in particular memory corruption bugs and double-fetch vulnerabilities, by analyzing the interaction between device drivers.

Aiming at kernel-specific challenges, Kim et al. (2020) designed a hybrid kernel fuzzer, HFL. It improves hybrid fuzzing to target the kernel’s characteristics by shifting control transfer from implicit to explicit, performing inference of system call sequences to establish consistent system states, and identifying the nested parameter types of system calls. This ultimately improves the efficiency coverage of hybrid fuzzing. Considering the problem of data race in concurrency, Jeong et al. (2019) designed Razzer to achieve the efficient discovery of race in kernel using static analysis and deterministic thread interleaving techniques. Static analysis is used to analyze potential locations that exist data race in the source code. Deterministic thread interleaving is used to control thread schedule to provide accurate parallel execution information and reduce uncertainty. Xu et al. (2020) designed a coverage-based fuzzer, KRace, which replaces path coverage with alias coverage to achieve accurate data race detection.

4.4 Smart contract

Smart contracts have different characteristics compared to traditional applications, which presents a whole new challenge for the fuzzing of smart contracts. Firstly, the execution state of smart contract programs for different test cases is passed through the global Storage store and affects each other. Thus, it is difficult to improve global coverage by providing coverage guidance feedback for test cases of individual functions. Secondly, the vulnerabilities may come from different levels of the blockchain, virtual machine, and high-level language, and there are many differences between them, which makes it challenging to detect vulnerabilities in smart contracts. Fuzzers for smart contract are shown in Table 17.

ContractFuzzer (Jiang et al. 2018) is a fuzzer that detects vulnerabilities in smart contracts. It instruments environment virtual machine (EVM) to log the runtime information of smart contracts and generates test cases that meet smart contracts grammars by learning the contract application binary interface specifications to fuzz smart contracts. Seven test oracles are defined to detect types of Ethereum smart contract vulnerabilities including gasless send, exception disorder, reentrancy, timestamp dependency, block number dependency, and dangerous delegatecall, and freezing ether vulnerabilities. ILF (He et al. 2019) is a neural network-based fuzzer for smart contracts, which aims at generating better test cases and transaction sequences. It uses a symbolic execution engine to generate a large number of transaction sequences and then trains a neural network model that captures a probabilistic fuzzing policy for generating test cases. To solve the problems of hard constraints in execution and ignoring blockchain properties on smart contracts, EthPloit (Zhang et al. 2020) generates transaction sequences to fuzz smart contracts by instrumenting EVM to better simulate blockchain behaviors and using dynamic seeding strategies to solve hard constraints.

4.5 Protocol

A protocol is a set of agreements that both sides of a communicating computer must mutually adhere to. The implementation of protocols ensures that services can be provided at a higher level. As the service has a large number of space states, it is necessary to traverse these states using a sequence of input messages, which also poses difficulties and challenges for testing protocols. There are many security issues in protocols that can easily cause serious problems, such as information leakage, denial of service, and others. Due to the diversity of protocols, the complexity of the protocol state space, and the dependency of protocol state, detecting protocol-related problems is a challenge. Fuzzers for protocol are shown in Table 18.

Table 17 Fuzzers for testing smart contract
Table 18 Fuzzers for testing protocol

AutoFuzz (Gorbunov and Rosenbloom 2010) is an automated network protocol fuzzing framework that first constructs a finite state automaton (FSA) to capture communications between client and server to understand the implementation of protocols, and then learning the individual message syntax. Finally, AutoFuzz uses the FSA as a guide to fuzz the client and server protocols by modifying the input traffic. SECFUZZ (Tsankov et al. 2012) uses a modular fuzzing approach to fuzz stateful security protocols that handle encrypted traffic, and uses a series of custom mutation operators to generate test cases to detect security vulnerabilities. AFLNet (Pham et al. 2020) is a greybox fuzzer for protocols based on AFL. It uses code coverage feedback and state feedback to guide the fuzzing process.

Aiming at industrial protocol, some fuzzing methods have been proposed. Zhao et al. (2019) proposed a protocol testing framework, SeqFuzzer, which trains sequence-to-sequence network models to automatically learn the frame structure of protocols to generate fake but plausible messages as test cases, sending them to perform and monitoring irregular industrial control system behaviors to find vulnerabilities. Lv et al. (2020) proposed an intelligent protocol framework, BLSTM-DCNNFuzz, which uses deep convolution generative adversarial networks to generate protocol messages to fuzz industrial control protocols. Luo et al. (2020) added coverage information to the traditional protocol fuzzer Peach, saving valuable packets and decomposing them to construct new high-quality test cases for future testing.

4.6 Machine learning model

With the development of artificial intelligence, more and more machine learning models are being used in various fields. Fuzzing techniques for machine learning models have attracted the attention of researchers. Fuzzers for machine learning model are shown in Table 19.

CAGFuzz (Zhang et al. 2022) is a coverage-guided adversarial generative fuzzing framework that uses the coverage of neurons as guide and trains an adversarial test case generator to generate as many adversarial test cases as possible under the condition of less disturbance. The generated test cases are suitable for different deep neural networks. DeepHunter (Xie et al. 2019) is coverage-guided fuzzing framework that combines five existing testing criteria including neuron coverage, k-multisection neuron coverage, neuron boundary coverage, strong neuron activation coverage, and top-k neuron coverage, to detect defects of deep neural networks. It uses a seed selection based on diversity and recency, and more fine-grained metamorphic mutation to generate test samples. That has great advantages in achieving high coverage and error detection capabilities. TensorFuzz (Odena et al. 2019) is a coverage-guided library for neural networks which is used to find numerical errors in the trained network, detect inconsistencies between models and quantized version, and display undesirable behavior in the character-level language model. It leverages fast approximate nearest neighbors algorithm to explore the activation function in neural networks as a coverage metric, and determines whether a new coverage is generated by detecting the similarity of activation vectors. Luo et al. (2021) proposed a graph-based fuzzing method which uses six different mutation strategies including graph edges addition, graph edges removal, block nodes addition, block nodes removal, tensor shape mutation, and parameters mutation to generate diversified digraph structure of deep learning models to fuzz deep learning inference engines. Monte Carlo tree search is used to search most promising mutation operators to generate new models. TitanFuzz (Deng et al. 2023) tests deep learning libraries with large language models (LLMs), which uses generative LLMs such as Codex by providing high-quality seed programs, and then uses filling LLMs such as InCoder to mutate the seed programs to generate high-quality seed inputs.

Table 19 Fuzzers for testing machine learning model

5 Conclusion and future direction

The development trends of fuzzing encompass automation, intelligence, technique integration, collaboration, diverse application domains, and open-source testing tools. Fuzzing efficiency and ease of use have improved due to automation, while intelligent techniques have enhanced test case generation and vulnerability discovery. Integration with other techniques has bolstered testing effectiveness, and fuzzing has many applications in a wide range of domains. The availability of open-source fuzzers has stimulated active participation from developers and researchers. These trends underscore the significance of fuzzing as a critical technique in software and security testing. This paper provides a comprehensive review of common fuzzing processes and various types, highlighting the details of fuzzing techniques through the example of CGF and showcasing cutting-edge research. Furthermore, we discuss the diverse application areas of fuzz testing.

Future research directions can focus on the following areas.

  1. (1)

    Learning-aware, smart fuzzing. Traditional approaches often primarily hinge on fixed mutation strategies or weighted test cases generated through grammatical analysis. However, smart fuzzing techniques will continuously collect and learn execution information from the target program. This deep understanding of program states will guide seed generation, improve code coverage, and enable real-time monitoring of program exceptions. By incorporating machine learning and other intelligent algorithms, smart fuzzing can adapt and optimize the fuzzing process based on continuous learning, leading to improved efficiency in vulnerability discovery.

  2. (2)

    Vulnerability-aware fuzzing techniques. Despite significant advancements in CGF, there exists potential for improvement in identifying specific vulnerability types. Future fuzzing research will focus on developing vulnerability-aware techniques. These techniques will continuously capture and analyze characteristics and patterns of known vulnerabilities to guide the fuzzing process. By leveraging the information obtained from previous vulnerabilities, such as input patterns or triggers, fuzzing can be directed toward discovering similar vulnerabilities in different software systems. This strategy augments the vulnerability detection rate and facilitates targeted testing for distinct security weaknesses.

  3. (3)

    Incorporating new techniques in fuzzing. New techniques can help improve the efficiency of fuzzing and the speed of discovery of vulnerabilities. New techniques, such as machine learning (Godefroid et al. 2017), reinforcement learning (Hou and Su 2022; Wang et al. 2021), intelligent optimization (Avci and Avci 2019; Wang et al. 2014; Wang and Tan 2019), LLMs (Deng et al. 2023; Wang et al. 2022), and parallel (Liang et al. 2018) techniques, are integrated into fuzzing to assist fuzzing to increase coverage and the speed of vulnerability discovery.

  4. (4)

    Fuzzing in emerging applications. While fuzzing has been extensively applied in areas such as file fuzzing, protocol fuzzing, and kernel testing, there is a need to explore its applicability in emerging and complex applications. Future research will focus on developing customized fuzzing solutions tailored to specific applications, such as machine learning models, smart contracts, and IOT devices  (D’Angelo et al. 2023, 2023). These new and complex applications present unique challenges and require specialized fuzzing techniques to uncover vulnerabilities effectively.

  5. (5)

    Works related to fuzzing. As fuzzing continues to gain popularity, related research areas will also evolve, such as anti-fuzzing techniques (Güler et al. 2019), exploit generation techniques (Heelan et al. 2019; Wang et al. 2018; You et al. 2017), evaluation methods, benchmarks, and metrics (Böhme and Falk 2020; Ding and Goues 2021; Li et al. 2021).