Keywords

1 Introduction

Software testing is one of the most important means to ensure software quality. Statistics show that software testing accounts more than 50 % of the total cost of software development in general [1]. With increasing software complexity, software testing is becoming more and more difficult and expensive. How to generate test data intelligently and automatically to improve the efficiency of software testing has becoming one of the most outstanding subjects [2].

Coverage to the software output domains test data is an important part of software testing design. In general, it is an essential test content especially in functional testing. Statistical results and causes analysis show that a large number of software failures occur on the border of software output domains, or when an output value appears in a special domain which normally is difficult to reach. Typically, coverage to all software output domains and test cases specially designed for domain boundaries, will achieve good effects. However, for a given output, automatically generate corresponding input test data just according to the software requirements specification is a difficult thing. This is one of the reasons why here are few studies.

At present, in functional test, due to the lack of formal specifications, test data generation is generally dependent on artificial selection. Exploratory testing [3] popular recently is indeed a typical manual testing. In structural testing, due to the limitations of symbolic execution and other automation technologies, software testers have to manually generate test data to cover some specific goals. Random testing can achieve a high degree of automation, but it will generate too much test data and these data cannot be guaranteed to cover the test objectives. It is why random testing is inefficient in troubleshooting [4].

Evolution test takes advantage of genetic algorithm to transform test data generation problem into numerical optimization problem, is one of the hot topics in test data automatic generation [46]. By simulating the process of biological evolution, genetic algorithm (GA) searches the optimal solution for the optimization problem. GA maintains a population of potential solutions; it randomly samples in the entire search space, and evaluates each sample in accordance with its fitness function. In the genetic algorithm, some operators such as selection, crossover and mutation are used, which constantly iterates (each iteration is equivalent to one cycle of biological evolution) to search for a global optimal solution, until the termination condition is met.

In evolution test, the search space of GA is the input domain of the software and the optimal solution is some test data to meet for the specified testing purposes. The search process can be automated, which is helpful to improve software test efficiency. Currently, GA is used widely in structural testing, taking coverage ability as optimization goal [710]. Researches on GA in functional testing are not very extensive. Existing methods include a method based on Z language specification [11] and the method based on pre/post-conditions [12]. Because of the high cost of formalization, these methods are difficult for large systems.

To this end, this paper proposed an output-oriented test data generation method suitable for functional testing. It used gray-box technology, transformed the coverage of software output domains into coverage to branches of pseudo-path, and then took use of ideas from structural testing. Some experimental results showed that the method was more efficient than random testing and manual testing.

2 Output-Oriented Functional Testing

2.1 Problem Formulation

From the view of whether concerns the internal structure of the software, software testing can be divided into black-box testing and white-box testing. Black-box testing, also known as functional testing, takes the tested software as a black box and uses only the relationship between software output and input to do testing. White-box testing, also known as structural testing, designs test cases by analyzing the internal structure of the tested software. Black-box and white-box testing have their advantages and disadvantages. Gray-box testing is a way between black-box and white-box testing, combining advantages of them, are often able to achieve better test results [13].

In gray-box testing [13], first, draw a functionality overview map based on the software requirements specification; then, based on understanding to the structure of software source code, refine and expand the map for advanced software design model named SHDM; finally, select some principles such as node coverage, edge coverage or path coverage to design test cases. So, we can use of gray-box technology to transform the coverage of software output domains into coverage to branches of pseudo-path.

In the evolution test of output domains coverage, the population are the set of test data, the individual is a test data, and test target is a specified pseudo-path (obtained through gray-box testing technique). The mission of GA is to optimize the population, based on the fitness function, to obtain a test data to perform the trace coverage test objectives.

The core issue here is how to construct an appropriate fitness function, to evaluate the merits of test data with respect to the test object.

2.2 Evaluation of Test Data

Without loss of generality, we assume the test object is a branch sequence named \( aim = \left\langle {b_{1} ,b_{2} , \cdots ,b_{n} } \right\rangle \), the trace of a test data \( data \) is \( trace = \left\langle {x_{1} ,x_{2} , \cdots ,x_{m} } \right\rangle \), where \( b_{i} \left( {1 \le i \le n} \right) \), \( x_{i} \left( {1 \le i \le m} \right) \) is a branch. Compute their deviation by formula (1):

$$ diff\left( {aim,trace} \right) = length\left( {aim} \right) - match\left( {aim,trace} \right) $$
(1)

Within, \( length \) is named length function, \( length\left( {aim} \right) = n \) denotes length of sequence \( aim \); \( match \) is named match function, \( match\left( {aim,trace} \right) \) denotes the match degree between test data \( data \) and aimed sequence \( aim \).

The value of match function is in \( \left[ {0,\,n} \right] \). The larger value means a higher match degree and a smaller deviation \( diff \); the smaller value means a lower match degree and a higher deviation \( diff \). Defining properly the matching function can be reflected the quality of the test data to help evaluate the test data with respect to the test object.

2.3 Design of Match Function

For a test target \( aim = \left\langle {b_{1} ,b_{2} , \cdots ,b_{n} } \right\rangle \) and a trace \( trace = \left\langle {x_{1} ,x_{2} , \cdots ,x_{m} } \right\rangle \), calls a trace \( \left\langle {x_{p} ,x_{p + 1} , \cdots ,x_{q} } \right\rangle \) as a sub trace with power \( \left( {q - p + 1} \right) \) of \( trace = \left\langle {x_{1} ,x_{2} , \cdots ,x_{m} } \right\rangle \) corresponding to \( aim = \left\langle {b_{1} ,b_{2} , \cdots ,b_{n} } \right\rangle \), if string \( x_{p} x_{p + 1} \cdots x_{q} \) is a sub string of \( x_{1} x_{2} \cdots x_{m} \) and \( x_{p} = b_{1} \).

Lets \( \Theta { = }\left\{ {subtrace} \right\} \) denote the set of all sub trace of \( trace \) corresponding to \( aim \), we define the match function of \( trace \) corresponding to \( aim \) as following formula (2):

$$ match\left( {aim,trace} \right) = \left\{ {\begin{array}{*{20}c} {\mathop {\hbox{max} }\nolimits_{{_{{_{subtrace \in \Theta } }} }} \left( {length\left( {subtrace} \right)} \right),if\text{ }\Theta \ne \emptyset } \\ {0,if\text{ }\Theta = \emptyset } \\ \end{array} } \right. $$
(2)

Obviously, the value of \( match \) is in \( \left[ {0,\,n} \right] \). The larger the value, the higher the degree of matching.

For example, if \( aim = \left\langle {b_{1} ,b_{2} ,b_{3} } \right\rangle \) , \( trace_{1} = \left\langle {b_{1} ,b_{4} ,b_{1} ,b_{2} ,b_{3} } \right\rangle \) and \( trace_{2} = \left\langle {b_{1} ,b_{4} ,b_{1} ,b_{2} ,b_{5} } \right\rangle \), then there are \( match\left( {aim,trace_{1} } \right){ = }3 \) and \( match\left( {aim,trace_{2} } \right){ = 2} \).

3 Test Data Generation Based on Genetic Algorithm

3.1 Process of Genetic Algorithm

In genetic algorithm (GA), each effective solution to the problem is called a “chromosome”, with respect to each individual of population. A chromosome is a coded string using a specific encoding approach, and each unit of the coded string is called a “gene”. By comparing the fitness values, GA distinguishes the pros and cons of chromosomes. The chromosome with larger fitness value is more outstanding.

In GA, fitness function is applied to compute the fitness value of corresponding chromosome; selection is used to choose some individual in accordance with certain rules, and form the parent population; crossover is applied to interchange part of genes of two individuals to generate their offspring chromosomes; mutation is used to change a few genes of selected chromosome to get a new one.

The main steps of GA include [15], as shown in Fig. 1:

Fig. 1.
figure 1

Flowchart of GA

STEP1.:

To Initialize a population with \( N \) chromosomes, get the genes of every chromosome in random manner and keep them inside the range of the problem definition. Denote the count of generation \( Generation \) and let \( Generation = 0 \).

STEP2.:

To evaluate each chromosome using fitness function, calculate the fitness value of every chromosome, save the best one whose fitness is largest and name it \( Best \).

STEP3.:

To do selection using of the manner such as Roulette wheel, generate the population with \( N \) selected chromosomes.

STEP4.:

To do crossover in accordance with the probability \( p_{c} \). Each couple of selected chromosomes interchange some genes to generate their two offspring and replace themselves; other chromosomes retain in the population.

STEP5.:

To do mutation in accordance with the probability \( p_{m} \). Some new chromosomes are generated separately through altering a few genes of corresponding selected chromosome; other non-selected chromosomes retain in the population.

STEP6.:

Re-evaluate each chromosome using the fitness function. If the largest fitness value in the new population is better than \( Best \)’s, replace \( Best \).

STEP7.:

Let \( Generation++ \). If \( Generation \) exceeds the specified maximum generation or \( Best \) achieves the specified error requirement, end the algorithm; otherwise, goto STEP3.

3.2 Design of Representation

In order to facilitate genetic manipulation, the individual typically needs to be encoded into another representation (chromosomes). Coding strategy is designed largely dependent on the nature of the problem, common including binary coding, real coding, ordered string coding, etc. With binary coding, individual is represented as a binary bit string (vector), which is structurally similar to biological chromosome. It is helpful to use biological theory to explain the genetic algorithm, also makes easy genetic manipulation. With real coding, individuals represented as a real vector, the structure is easy to introduce relevant domain knowledge to increase the genetic algorithm search capabilities.

Since the test data can be generally expressed as a numeric vector, and therefore we have adopted a binary encoding in this paper. Real coding is also applicable actually. The scale of initial population affects to the search capability and operational efficiency of GA, so it usually range from 20 to 150.

3.3 Design of Fitness Function

Fitness function is used to calculate the fitness value of each chromosome and to guide the search direction of GA. So, it is the key part of genetic algorithm implementation. Generally, the fitness value is between 0 and 1. The individuals with larger value are more excellent, and have greater probability to evolve to the next generation.

Here, we adopt function \( f \) as fitness function, as shown in formula (3):

$$ f = \left\{ {\begin{array}{*{20}c} {\frac{1}{diff},\text{ }if\text{ }diff \ne 0;} \\ {0,\text{ }if\text{ }diff < \varepsilon .} \\ \end{array} } \right. $$
(3)

Where \( \varepsilon > 0 \) denotes a very small positive number, its value depends on the specific circumstances, as \( 10^{{{ - }5}} \) for an example; \( diff \) is defined by formula (1).

3.4 Design of Other Operator

Roulette wheel selection is usually taken as the selection operator. For example, for a population with \( k \) individuals, denotes \( fitness_{i} \) for fitness of \( i^{th} \) individual, the roulette wheel selection include 5 steps: first, calculate the fitness percent \( fitness_{i} /\sum {fitness_{i} } \) which show the capability of each individual to yield offspring; second, sort the individuals by descending order of their fitness percent; third, for each individual, sum up all the fitness percent of individuals that are in ahead of it; then, select the first individual whose summed fitness is greater than the random number \( r_{s} \in [0,1] \); lastly, loop above steps until enough individuals is born. As can be seen, the individuals with greater fitness will have larger probability to be selected to produce the next generation, which is consistent with the principles of evolution. As needed, the policy of randomly selected may also be used.

Crossover generally uses single-point crossover, its probability values generally set at between 0.5 and 0.99. As needed, two-point crossover or three-point crossover may also be used.

Mutation changes some genes of the selected chromosome to generate a new individual. Mutation operators generally use random variation, mutation probability values are generally set between 0.001 to 0.1.

Note that if the target path is more difficult to hit, it is need to maintain a large selection pressure in order to accelerate the convergence speed of the genetic algorithm. In this case, we tend to use strategy with greater selective pressures such as roulette wheel selection, optimal survival, etc. Conversely, if the target path is easier to hit, we tends to use strategy with smaller pressures such as random selection [14].

4 Simulation Experiment

4.1 Test Objects

We choose three different types of C language program to verify the effectiveness of our method, are shown in Table 1.

Table 1. Description of tested softwares

4.2 Performance Indicators

We used the hit rate \( p_{hit} \), the average number of evolution rounds \( G_{hit} \) and the total numbers of test data \( Total \) as the performance indexes to check the performance of the proposed method.

Supposed that the total number of experiments is \( C_{0} \), the number of successful experiments generated test data to meet the objectives is \( C_{hit} \), population size is \( N \), the biggest evolution round is \( G_{\hbox{max} } \), the round of evolution obtained i-th satisfied test data is \( G_{hit}^{i} \). The definitions of the hit rate \( p_{hit} \), the average number of evolution rounds \( G_{hit} \) and the total numbers of test data \( Total \) were given as follows:

$$ p_{hit} = \frac{{C_{hit} }}{{C_{0} }} $$
(4)
$$ G_{hit} { = }\frac{{\sum\limits_{i = 1}^{{C_{hit} }} {G_{hit}^{i} } }}{{C_{hit} }} $$
(5)
$$ Total{ = }N \times \left( {G_{hit} \times p_{hit} + G_{\hbox{max} } \times \left( {1 - p_{hit} } \right)} \right) $$
(6)

By definitions we can be seen: the higher the hit rate, the greater the probability of successful test data generation; the smaller the average number of evolution rounds, the algorithm converges faster; the smaller the total number of test data, the higher the efficiency of the algorithm. Note that complete hit herein referred to generate the desired test data set that coverage each output using at least one test data.

4.3 Results Analysis

The main purpose of the experiments is to use the hit rate and other indicators to examine the performance of the proposed method, and compare results with random tests. The main experimental procedure is as follows:

First, set strategies of evolution testing in the proposed method. For all the three tested software, used binary encoding, roulette wheel selection, random survival, single-point crossover and random mutation strategy. The basic strategies as shown in Table 2.

Table 2. Strategies of evolution testing

Then, for each of the tested software, configured the basic parameters of genetic algorithm such as the global population size and other parameters, the basic parameters shown in Table 3.

Table 3. Parameters of genetic algorithm

Finally, for each tested software, conduct several experiments to statistics its hit rate \( p_{hit} \), the average number of evolution rounds \( G_{hit} \) and the total numbers of test data \( Total \) to compare with random testing (\( N \) groups in every round, \( G_{hit} \) round in every experiment). As can be seen from Table 4, the proposed method of this paper has good advantages.

Table 4. Partial experimental results and comparison (15 experiments)

5 Conclusion

Coverage to the software output domains is an important part of functional testing. However, since for a given output, to generate test data automatically according to the software requirements specification is a very difficult thing, there are very rare approaches with good operability to do it currently.

By transforming test data generation problem into numerical optimization problem, and taking the ability of coverage to program path as optimization goal, there are now some good researches on evolution test in structural testing.

This paper gave a new idea for solving the output domains coverage problem. By making use of gray-box technology, it transformed the coverage of software output domains into coverage to branches of pseudo-path. And then, made it to be feasible that those evolution testing methods in structural testing could be used in functional testing. Various experiments had shown that the proposed method in this paper was superior to random testing.

Next, we would experiment to compare the effects of different evolution testing methods in structural testing to obtain optimum method might be used to address specific output domains coverage issues of functional testing.