Keywords

1 Introduction

A typical software development process consists of four phases: specification, designing, coding and testing, where software testing is an indispensable part to ensure the quality of software products [17]. Traditionally, software products are tested by executing groups of typical test cases selected by test engineers manually for specific test objectives. However, the performance of testing is highly depended on the ability and experience of test engineers. This makes the testing tasks difficult to manage. Furthermore, the labour intensive nature leads software testing to a high cost process. It is estimated that approximately half of the project budget is spent on testing [9]. This explains the increasing interests in automatic software testing in both academia and software industry [5].

Search-based software testing is the main sub-area of search-based Software Engineering (SBSE) concerned with automatic software testing [1, 8]. SBSE is used to generate test data, prioritize test cases, minimize test suites, optimize software test oracles, reduce human labour cost, verify software models, test service-orientated architectures, construct test suites for interaction testing, and validate real-time properties. Given the fact that test data is the fundamental of almost all software testing processes, search-based test data generation continues to be the major research area with an increasing amount of research.

Search-based test data generation successfully implements automatic test data generation using search techniques, for example, Genetic Algorithms (GAs), Genetic Programming (GP), and Hill Climbing (HC) [15]. In addition to a reduction of oracle cost, search-based test data generation also improves the quality of test suites because of the sufficient search for possible solutions. Currently, the majority of existing approaches focus on single-objective test data generation with a focus on only one objective (e.g. branch coverage which is the degree to which the source code of a program is tested by a particular test suite) while comparatively little study on multi-objective (e.g. considering branch coverage, execution time and memory consumption simultaneously) test data generation [9]. Unfortunately, real-world testing problems are complicated with multiple conflicting objectives and are unlikely to be captured by a single objective.

Among the relatively small amount of multi-objective test data generation research, evolutionary multi-objective optimization (EMO) is the main approach because multiple solutions can be obtained in a single run due to the population based search in evolutionary computation. However, there is not much research on designing specific EMO methods for multi-objective test data generation. Furthermore, as one of the most popular evolutionary algorithms, GAs have been widely used in search-based test data generation. However, binary encoded representation in GAs is not capable of generating complicated test data and many non-functional objectives are difficult to deal with. More effective and specific multi-objective optimization methods can be used to improve the accuracy and efficiency. GP is such an method that is potentially more powerful in dealing with complicated data because of the flexible representation, but this has not been investigated.

Goals: This paper aims to develop a GP based multi-objective test data generator using EMO with the purpose of improving the overall performance and contributing to search-based test data generation. To achieve this goal, we will develop a GP based multi-objective test data generation approach. Due to the flexible representation, this approach is expected to solve more complicated test data generation problems than existing ones. We will investigate the performance of the proposed GP approach by comparing it with other successful evolutionary test data generation approaches.

2 Background

2.1 Search-Based Test Data Generation

In search-based test data generation, the problem is modeled as a numerical function optimization problem and some heuristic is used to solve it. The techniques typically rely on the provision of “guidance”, i.e. objectives, to the search process via feedback from program executions. Search-based test data generation for functional testing generally employs a search and optimization technique with the aim of causing assertions at one or more points in the program to be satisfied. There has been a growing interest in such techniques. More and more applications of these techniques to software testing can be seen in recent years. To evaluate the applicability of such approaches many tools have been developed in the past, e.g. TESTGEN [16], Kalman filter-based approach [2] and prioritized pairwise approach [5].

Software testing has multiple objectives, which are branch coverage, memory consumption, time consumption, CPU consumption, etc. Due to the fact that many of the objectives are conflicting to each other, trade-offs should be considered in order to produce desirable test suits. Additionally, a considerable amount of budget can be saved if test data can be generated automatically. However, software testing is a complicated process with different kinds of testing and a variety of objectives. The most popular technique to test programs consists of executing the program with a set of test data. Test engineers select a set of configurations for the Program Under Test (PUT), called test suite, and check the PUT behavior with them. To avoid execute the PUT with all the possible configurations, which is infeasible in practice, the program can be tested with a representative set of test data [5, 6]. Therefore, automatic test data generation aims to generate an adequate set of test data in an automatic way to test a program, thus preventing engineers from the task of selecting an adequate set of test data to test the PUT. Therefore, another objective is the minimization of the cost, which can be achieved by minimizing the test suite sizes. So, the software testing problem can be defined:

Let P be a program, \(B_p\) denotes its full set of branches and \(BranchExec_p(C)\) denotes the set of branches covered in P by the execution of a given set of test data, C. The branch coverage of the test suite C is represented by \(BrCov_p(C)\), which is the ratio between the traversed branches in the executions of the program P with C as the set of test data and \(|B_p|\) representing the total number of branches of the program, i.e., \(BrCov_p(C) = \frac{\vert BranchExec_p(C) \vert }{\vert B_p \vert } \). The adequacy criterion of branch coverage states that a test suite C for a program P is “adequate” when \(BrCov_p(C)=1\). However, it is not always possible to reach such a perfect coverage, and in case of reaching it, the cost to test the entire program may be unaffordable. Consequently, a balance between coverage and the cost to achieve such a perfect coverage, and in case of reaching it, the cost to test the entire program can be unaffordable. Since the cost of the testing phase depends on the test suites size (the number of test cases in a test suite), minimizing the test suite size, denoted with \(\vert C \vert \), should be another goal. However, the majority of existing approaches are single objective with a focus on branch coverage [9], and ignore the test suite size.

2.2 Related Work

In the research area of test data generation, the majority research focuses on single-objective test data generation with the aim of achieving an optimal solution on a particular objective. Only a few research papers focus on multi-objective test date generation.

Lakhotia et al. [11] implement and compare two multi-objective approaches for test data generation - Pareto GA and weighted GA. In addition to the objective of branch coverage, they also use the memory consumption as the second objective. However, the limitation is that it is difficult to choose the weights for each objective. Pinto et al. [14] propose two new solution representation for solving multi-objective test data generation problems. In order to have a flexible representation, they use an array list to represent an individual rather than a binary string in GA. Each item in the array list is one single test data. Different types of solutions can be explored in the problem search space with this representation. In order to adapt to more complex testing problems such as object-oriented programs, another representation is proposed. Ghiduk et al. [7] propose a GA-based multi-objective test data generator to automatically generate test data for data-flow coverage criteria.

Oster and Saglietti [13] propose a novel approach to object-oriented test data generation problems. This approach is fully automatic and it can be used in any Java programs without any restriction. This approach focuses on maximizing branch coverage and minimizing the sizes of test suites which are the most popular and useful objectives in test date generation problem. The experimental results show that the approach adopting evolutionary computation methods performs significantly better than the random testing approach.

3 Proposed Approach

In this section, a GP-based approach is proposed to generate software test data, and two GA-based approaches, GA-B and GA-R, are also investigated for comparisons. GP [10] is an evolutionary approach to generating computer programs, commonly represented by a tree, for solving a given problem. A typical GP representation is a tree-like structure, where a function set including a number of operators are used to form the internal nodes of the tree, and a terminal set including variables of the problems and random numbers are used to form the leave nodes of the tree. GP consists of a population of trees, where each individual/tree represents a candidate solution for the target problem. A general GP evolutionary process starts with a random initial population, and the population of individuals are evaluated based on the fitness function, and updated by applying selection, mutation, crossover and reproduction operators. The evolutionary process stops when a predefined stopping criterion is met. Therefore, in the rest of the section, we will discuss the representation, the crossover and the mutation operators, the objectives, and the evolutionary multi-objective frameworks.

3.1 Representations

A typical program in a software can contain one or more arguments, several branches and many statements. The test suite for this program contains a set of configurations, each of which is a set of arguments with particular values. The size of a test suite is the number of configurations. The branch coverage is the total proportion of branches executed after executing all the configurations in the test suite.

To represent the data structure in the test data generation problems, this paper adopts a three-layer data representation for each GP individual, as shown in Fig. 1. Different from the traditional single tree based GP, there are multiple trees in each individual. The first layer is a typical GP tree in the GP approach. The terminal sets and function sets are configured regarding the arguments type. The second layer is a set of GP trees. Each tree represents an argument of the PUT and the length is fixed according to the PUT, e.g. for a PUT with three arguments, the length of the second layer is three. Each set of trees in this layer is a single test case. The third layer represents the entire test suite, where its length is unfixed because it is the test suite size, i.e. one of the objectives. This length may change after each crossover and mutation operation.

Fig. 1.
figure 1

Three-layer data representation

For comparison purposes, two GA based approaches are also developed in this paper. One approach represents the solutions using a binary strings, which is named GA-B. The other represents the solutions using real numbers, which is named GA-R. These two approaches adopt the same three-layer representation. However, in the first layer, GA-B approach uses binary strings while GA-R approach uses real numbers.

3.2 Crossover

The crossover operation plays an important role in the GP approach. Due to the data presentation structure, this paper adopts a modified crossover operation. Before crossover, two individuals are selected by a selection operation, in this work, a tournament selection scheme is used. Let the selected individuals be \(I^1\) and \(I^2\). \(I^1\) and \(I^2\) may have different lengths, that is, \(I^1\) and \(I^2\) may have different numbers of test cases. Figure 2 shows an example of the structure of \(I^1\) and \(I^2\), where n is the length of individual \(I^1\), m is the length of individual \(I^2\), k is the number of arguments of PUT, and \(TS^1_{2,k}\) can be read as the kth tree (for the kth argument of the PUT) of the second test case in parent \(I^1\).

Fig. 2.
figure 2

The selected two individuals

Fig. 3.
figure 3

The crossover operator

After selecting the individuals as the two parents, \(I^1\) and \(I^2\), we randomly select a tree set \(TS^1_a\) from parent \(I^1\) and a tree set \(TS^2_b\) from parent \(I^1\). Then we apply the crossover operation to the two tree sets. In detail, for each tree \(TS^1_{a,i}\), c = 1,2 \(\cdots \) k, do crossover operation with the tree \(TS^2_{b,i}\). That is, the crossover operation is applied to the GP trees in the same position of the two selected tree sets. For the two GP trees under the crossover operation, we randomly select a node form each of \(TS^1_{a,i}\) and \(TS^2_{a,i}\), where the probability of selecting a function node type is \(q_f\) and that of a terminal node type is \(q_t\), \(q_f + q_t = 1\). After selecting one node from each of the two GP trees, we swap the subtrees rooted at the selected nodes. Then we repeat the whole step s times, where s is the length of the shorter individual. Figure 3 shows an example of the crossover operation, where the crossover operation is applied on the second and third trees in both \(TS^1_{a}\) and \(TS^2_{b}\).

A detailed description of the modified crossover operation is given as follows:

  • Step (1) Randomly select t (tournament size) individuals from the population using the tournament selection;

  • Step (2) Select the best two individuals (\(I^1\) and \(I^2\)) from the tournament for the crossover operation;

  • Step (3) Randomly select two tree sets (\(TS^1_a\) and \(TS^2_b\)) from \(I^1\) and \(I^2\);

  • Step (4) Select trees from the same position of \(TS^1_a\) and \(TS^2_b\) as \(TS^1_{a,i}\) and \(TS^2_{b,i}\);

  • Step (5) Choose a node type: a function node type is chosen with probability \(q_f\) and a terminal node type is selected with probability \(q_t\). Randomly select a node for the chosen type from each of the trees \(TS^1_{a,i}\) and \(TS^2_{b,i}\).

  • Step (6) Swap the subtrees rooted at the selected nodes of \(TS^1_{a,i}\) and \(TS^2_{b,i}\).

  • Step (7) Repeat Step 3 to Step 6 s times, where s is the length of the shorter individual.

This crossover operation has some interesting aspects. First, we apply crossover operation to individuals with different length. Second, only trees in the same position in a tree set (test case) are applied crossover operation. As a result, programs with different types of arguments can also be handled with this GP approach.

For the GA-B approach, the crossover operation is a modified one-point crossover. It randomly selects two parents and applies the one-point crossover operation on the selected test data using similar strategy to the GP approach. However, because of the nature of real number encoded approaches, GA-R approach does not apply classical crossover operations on individuals.

3.3 Mutation

Mutation operation is also important for test data generation problem. In this process, we change the length of the selected individuals. The mutation operator adds new test cases to the individuals with probability \(P_{add}\) (Mutation-Add), deletes one test case with probability \(P_{delete}\) (Mutation-Delete) and keeps the individual unchanged with the probability \(P_{unchanged}\) (Mutation-Unchanged). \(P_{add} + P_{delete}+ P_{unchanged} = 1\). Thought this operation, new test cases can be added in to the individuals and the length of them can be changed after each mutation operation.

The GA-B and GA-R approaches adopt the same mutation operation as the GP approach.

3.4 Two Objectives

In this work, we deal with the multi-objective test data generation problems with two conflicting objectives of maximizing the branch coverage and minimizing the test suite size.

3.5 Multi-objective Frameworks

GP was originally proposed as a single objective approach. Recently many different evolutionary multi-objective optimization methods have been proposed. As the first work of using GP for multi-objective test data generation, we would like to introduce popular evolutionary multi-objective search mechanisms to the GP approaches, which are

  • Non-dominated Sorting Genetic Algorithms II (NSGAII), introduced by Deb et al. [3]

  • The Strength Pareto Evolutionary Algorithm (SPEA2) is a multi-objective evolutionary algorithm proposed by Zitler et al. [18].

  • Multi-Objective Cellular Genetic Algorithm (MOCell), proposed by Nebro et al. [12], is a cellular genetic algorithm which is new but performs better than NSGA-II in some situations [5, 12].

The details of NSGAII, SPEA2, and MOCell are not presented here and readers are referred to the corresponding literature. The overall structure of the proposed GP-based multi-objective test data generations will follow the basic structure of these three multi-objective approaches by using the new representation, crossover, mutation and objectives described above.

4 Experiment Configuration

4.1 Benchmark

Ferrer et al. [4] developed a benchmark generator for software testers aiming to provide standard benchmark of programs to be used for comparing test data generators. This generator is designed to automatically generate programs with all branches reachable on the basis of certain features defined by the users. As a result, it is capable of generating significantly different programs in order to comprehensively compare different multi-objective test data generators. The current version of this benchmark generator can only generate programs with integer inputs, so that we propose to adopt more programs with double inputs.

The experiments used 160 Java programs with integer input and 160 Java programs with double input as the benchmarks to evaluate the performance of the proposed multi-objective methods. These programs cover four nesting degrees on the basis of the numbers of branches.

4.2 Parameter Settings

Using the three representations (i.e. GP with tree based encoding, GAs with binary encoding and GAs with real-value encoding) and the three multi-objective frameworks (i.e. NSGAII, SPEA2 and MOCell), nine different methods are investigated in this paper, which are denoted as NSGP, SGP, MCGP, NSGAB, SPGAB, MCGAB, NSGAR, SGAR, and MCGAR.

In the three GP based methods, each individual is encoded as a tree based three-layer structure. The population size is 200 and the maximum number of generations is 100, that is, 20000 evaluations in total. Binary tournament selection, the modified crossover and mutation operators described in the previous section are applied, where the mutation operation adds new test cases with the probability of 0.2, deletes one test case with the probability of 0.6 and keeps the individual unchanged with the probability of 0.2. The function set consists of addition, subtraction, multiplication and protected division. The terminal set consists of random real numbers. The max tree depth is 5.

In the six GA based methods, the total number of evaluations is also 20000, but the population size is 20 [11, 14]. Binary tournament selection, the modified crossover and mutation operators are also applied in the GA based methods with the same setting as in the GP methods. In the GA-R methods, the range of random number is from \(-32768\) to 32767 which is a commonly used range of integers. In the GA-B methods, each number is encoded as an 8-bit string regard to the input ranges of the benchmarks.

For the purpose of evaluating the evolutionary approaches, a random multi-objective test data generation approach is implemented as a baseline approach. This approach randomly produces test suites for the input Java programs. The final result of this approach is a set of all the non-dominated solutions found. The total number of solutions generated by the random methods is 20000, which is the same as the total number of evaluations in the evolutionary multi-objective methods.

All the methods have been conducted for 50 independent runs on each benchmark program. Hypervolume (HV), a commonly used multi-objective performance indicator, is used to compare the performance of different algorithms, where the best solutions achieve by all the methods are used as reference points.

5 Results and Discussions

5.1 Integer Problems

Since it is extremely hard to present the results of the nine algorithms on 160 benchmark programs, we summarized the results into two tables. Table 1 shows that on different nesting degrees, the number of times that one algorithm has the best median HV value (median of the 50 runs) among the nine algorithms. The total number is greater than 160 because there may exist two or multiple algorithms have the same best HV value on a benchmark program. Since the branch coverage is more important than the size of the test suits, we also presents the comparisons between the nine algorithms in terms of the median branch coverage results, which are shown in Table 2.

Table 1. Summary of the integer results
Table 2. Summary of the integer results according to branch coverage

According to Table 1, overall, the MCGP achieved the best HV values or the best solutions in 88 out of the 160 benchmark programs, i.e. better than the other eight methods. The second best algorithm is NSGP, which obtained the best solutions in 55 cases. NSGAR, SPGARand SPGARo btained the best solutions only in a small number of cases. The random search baseline algorithm did not obtain any best solution in the 160 benchmark problems. Such patterns can be observed from the benchmarks with all the four different nesting degrees.

Comparing the three different multi-objective frameworks, MOCell provides a better searching mechanism for automated test data generation problems, evidenced by the superior performance of MCGP. SPEA2 performed the worst, where only SPGAR achieved the best solution in 4 cases. In terms of the representation, the proposed GP representation achieved the best solutions in 143 out of the 160 benchmark programs, which is much better than the integer and double representations. The reasons might be that the tree structure is more flexible and the three-level structure is rich for producing good solutions. The binary encoding does not perform better than the real-value encoding, which might be due to that each integer number is encoded by 8 bits, leading to a higher dimension in the binary encoding than in the real-value encoding. All the three encoding schemes achieved better performance than random search.

Furthermore, according to Table 2, the three GP methods achieved the best branch coverage on almost all the benchmark problems. One reason might be that the GP approach has a larger search space that it has the potential to find test cases covering more branches. Additionally, all the GA-B, GA-R and GP approaches perform much better than the random search approach.

Table 3. Summary of the double results
Table 4. Summary of the double results according to branch coverage

5.2 Double Problems

Table 3 shows that on different nesting degrees in double benchmark problems, the number of times that one algorithm has the best median HV value (median of the 50 runs) among the nine algorithms. The sum of the numbers is greater than 160 because there may exist two or more algorithms have the same best HV value on a benchmark program. Since the branch coverage is more important than the test suits size, we also presents the comparisons between the nine algorithms in terms of the median branch coverage results, which are shown in Table 4.

According to Table 3, the NSGP and MCGP methods achieved the best HV values or the best solutions on almost all the benchmark problems. Only in 3 cases, SPGP achieved the best or the same best solutions compared with NSGP and MCGP. All of them achieved better performance than the random search method. Comparing the three multi-objective frameworks, both NSGAII and MOCell performed better than SPEA2. Furthermore, the GP methods with the tree-based representation achieved the best solutions on all the benchmark problems. Furthermore, according to Table 4, the three GP methods achieved the best branch coverages on almost all the benchmark problems, which may be due to the same reasons described above.

6 Conclusions and Future Work

This paper presents a new approach to multi-objective test data generation using GP, where a new representation, and corresponding crossover and mutation operators were proposed to generate test suites with the objectives of maximizing the branch coverage and minimizing the test suite size. Three multi-objective optimization frameworks (NSGAII, SPEA2 and MOCell) are used to investigate the potential of GP based multi-objective test data generation approaches. Two different kinds of benchmark programs are adopted to examine the performance of the new approaches. A random search approach and two GA-based methods are implemented as baselines for comparison. The results suggest that the new GP approaches perform the best on the two types of benchmark programs.

This is a preliminary work, but can show the ability of the flexible three-based representation in GP for multi-objective software testing data generation. In the future, we would like to work on handling multi-objective test data generation problem involving tree-based GP techniques. This paper only involves three of the software testing objectives, branch coverage, test suite size and execution time. More objectives can be involved and analyzed in the further work such as CPU consumption, memory consumption, path coverage, instruction coverage, line coverage etc. In addition, more complex problems can be handled by extending the tree-based GP approach. For example, dealing with classes and lists may also be interesting for handling the increasing number of object-oriented programs.