Keywords

1 Introduction

Software testing is the one of the most crucial phase in Software Development Life Cycle (SDLC). For the success of any software it is very important that the software is tested thoroughly with intent of finding maximum number of defects while testing. Software testing consumes approximately 50% of the software development efforts [1]. Changing business requirements and market trends urge the need for the software to undergo changes even after the software is delivered and becomes operational. Regression testing ensures to verify the correctness of the changed software and its corresponding affected parts after it becomes operational. Testing the whole software again demands a huge set of resources which is a big constraint in terms of time and money.

Test case prioritization (TCP) ensures that the test cases are arranged using priority value so that the test cases with highest priority are executed first. The test cases can be prioritized either randomly or based on the branch or statement coverage [2] of the software. Prioritizing test cases ensures that the time required to reach a performance goal is optimized [3]. The potential benefit of prioritizing test cases is that it ensures to rank the test cases rather than reducing or discarding the test cases from the test suite [4].

Particle Swarm Optimization (PSO) is a stochastic population based optimization technique given by Kennedy and Eberhart in 1995 [5] that is motivated by social behavior of fish schooling and bird flocking. The population in PSO called as particles that fly through the problem domain or the search space to reach an optimum solution. PSO is widely used because of its implementation simplicity, dimensions scalability and performance in terms of empirical solutions for global search problems [6]. It is used to solve a number of complex problems like permutation-flop shop sequencing problem [5], real-time embedded systems [7], capacitor-problem [8], soft-sensor [9] and generating test data in the basis of data flow coverage [10].

In this paper, we have implemented PSO with three different Benchmark Functions (BFs) for calculating the fitness value of the test cases. BFs are used to find optimal solutions for optimization problems and to compare and analyze the effectiveness of different optimization algorithms. In this work, we have used three single-objective BFs that are Sphere [5, 11], Rastrigin [5] and Griewank [5, 11, 12]. In current work, for a set of fifteen JAVA programs we first calculate the number of independent paths by drawing control-flow graph (CFG) and decision to decision (DD) graph. For every independent path a set of fifty test cases are generated that are further prioritized using PSO. The test results are driven based on the time taken and the global best value obtained while prioritizing test cases with each BF. The results proof Griewank works best with PSO as it takes approximately 8.90% of the average time taken by Sphere function to prioritize the test cases. It is also proved that Griewank takes approximately 64.17% of the average time taken by Rastrigin function for prioritization.

The paper proposes a way to minimize the testing efforts invariably by substituting different fitness function for calculating fitness of the test cases based on their position. This work suggests of using Griewank as BF with PSO to prioritize test cases as it would help in saving a lot of time, effort and cost during critical project deliveries when resources availability is a crunch. In this work, the test cases are prioritized and not reduced or removed from the test suite.

The rest of the paper is organized as follow: in Sect. 2 literature reviews is discussed. Section 3 covers the concept of test suite prioritization using branch and statement coverage. Section 4 explains PSO algorithm and BFs in detail. Section 5 covers a case study on TCP. In Sect. 6 results are derived based on the case study. In Sect. 7 different threats to validity are discussed. In the last section paper concludes and suggests future work.

2 Related Work

An extensive research work is carried out on test suite prioritization, selection and minimization using different nature inspired meta-heuristic algorithms [13]. An extensive work has been done in the field of test case prioritization, selection and test data generation using PSO. In 2014, Mor [4] has evaluated the effectiveness of different test suite prioritization techniques using Average Percentage of Fault Detected (APFD) metrics. Walcott in year 2006 [14], has given an algorithm for time constraint aware prioritization. Sharma [15] has modified the algorithm for prioritizing the test cases based on timing and APFD metrics in 2014. Hassan et al. worked on Genetic Algorithm (GA) and PSO and proved that PSO works best amongst the two in year 2006 [12]. Elbeltagia et al. [6] worked on GA, Memetic Algorithm (MA), Ant Colony Optimization (ACO) and PSO and proved that PSO works the best amongst the four algorithms. Nayak [16] has used PSO for automatic test data generation for data flow testing in year 2010. In 2014; Chawla [17] has given a hybrid algorithm for test data using soft-computing technique with PSO and GA.

Harman [18] has done a survey work in his paper in the area of regression testing minimization, prioritization and selection and suggested potential areas and scope for future work in them. In 2011, Kaur [19] has blended PSO with cross-over operator to avoid convergence of population to local best. In May, 2011 Arora [20] has given Hybrid Particle Swarm Optimization (HPSO) that is based on combination of PSO and GA to increase the search space. Routing problems, job-scheduling, task-scheduling problem [21] have been solved using PSO by many researchers in a distributed environment to solve tasks in efficient and cost-effective manner. El-Sherbiny [5] has given an algorithm for particle swarm optimization without using velocity for calculating position of particle. In 2004, Yang [22] has been used PSO to solve NP-hard problems like knapsack. In 2017, Kumar [23] has used PSO to solve NP-complete problem like test suite generation. In year 2007, Hendtlass [24] has worked on PSO algorithm with focus on counting total number of evaluations for calculating fitness.

As far as the literature survey is concerned El-Sherbiny [5] in his work has given an algorithm inspired by PSO for optimization problems that work without calculating velocity at which particle moves in the search space. He has efficiently reduced the number of iterations for calculating the best solutions for an optimization problem. In 2014, Mor [4] has used APFD metrics for measuring the rate of fault detection while prioritizing the test cases based on the coverage criteria provided by different prioritization techniques. He has concluded that higher values of APFD metric provide a better rate of fault detection. In 2006, Walcott [14] has given an algorithm for prioritizing test cases using GA in a time constraint environment for systems like PlanetLab and MonetDB that uses the concept of nightly builds and unit testing of the software. In 2010, Nayak [16] has simulated GA and PSO to generate automatic test data for data flow testing. His work proofs that PSO outperforms GA by 100% in def-use coverage. In 2014, Chawla [17] has proposed a hybrid algorithm based on PSO and GA to automate test data generation and the effectiveness of the algorithm is confirmed using percentage of fault coverage against unit of time and percentage of fault detected by generated test cases.

In 2011, Kaur [19] has given a Hybrid Particle Swarm Optimization Algorithm (HPSO) that combines the techniques for PSO and GA for widening the search space. In HPSO the initial population of PSO is mutated before performing rest of the steps of PSO algorithm to improve average percentage of fault detected. In 2010, Harman has performed both detailed analysis and survey in the field of regression test prioritization, selection and minimization. And his work suggested how the three terms are related to each other in terms of implementation. In 2012, Ming [25] et al. has combined mutative scale chaos and PSO to mitigate the problem of slow convergence and local optimum points of PSO algorithm. In 2007, Hendtlass [24] has used three fitness functions Sphere, Rastrigin and Schwefel’s function in different number of dimensions. He has proved that best fitness evaluation is found using Schwefel’s function. In the paper, we have referred the research work done after the year 2000 on test data generation, prioritization and optimization during software testing.

3 Test Case Prioritization

To make regression testing cost-effective we need to prioritize the test cases in such a manner that after executing the prioritized test cases it’s ensured that the changed part of the software is tested effectively. Test case coverage is a way to describe how well the code is covered by a given test suite. Therefore, while testing the software it becomes very essential to design an effective test suite that uncovers maximum number of defects in the software. To save the testing effort there is a need of ranking the test cases based on criteria that fulfills the prerequisite of uncovering maximum defects in the updated software. Different prioritization techniques are used for prioritizing test suite but in this work we focus on two aspects of test case coverage [7] as described below.

3.1 Prioritizing Based on Statement Coverage

Software coverage (SC) is based on the concept of executing all the statements in the code at least once for a given set of requirement. SC can be achieved with a basic knowledge of code structure and by using flowchart for the software workflow. SC is represented by a metrics that defines the number of statements covered by the test case in a code block. In any program SC is calculated using the formula specified in Eq. (1). For a given procedure P as shown in Fig. 1a the number of statements covered by a set of three test cases is represented using statement coverage metrics in Fig. 1b.

Fig. 1
figure 1

For a program P shown in a that consists of two branch nodes the corresponding matrix for calculating the statement and branch coverage of the program is depicted in b and c respectively

$${\text{Statement}}\,{\text{Coverage = }}\frac{{{\text{Number}}\,{\text{of}}\,{\text{executed}}\,{\text{statement}}}}{{{\text{Total}}\,{\text{number}}\,{\text{of}}\,{\text{statements}}}} \times 100$$
(1)

3.2 Prioritizing Based on Branch Coverage

Branch coverage (BC) is also known as all-edges coverage or decision coverage. BC focuses on the aspect of covering all the branches or simply all the true and false conditions in the code. Prioritizing test cases on the basis of BC focuses on ranking the test cases on the basis of total number of decision or branch nodes covered by the test case. In this study we have designed the test cases in such a manner that all the branch conditions in a program are covered by the test cases. A test suite that covers all the decision nodes in the software would automatically give 100% of SC. The BC for any given software can be calculated using the formula specified in Eq. (2). For the procedure P in Fig. 1a the BC is represented by listing all the branches and the corresponding test cases that cover these branched is shown in Fig. 1c.

$${\text{Branch}}\,{\text{Coverage = }}\frac{{{\text{Number}}\,{\text{of}}\,{\text{decisions}}\,{\text{outcome}}\,{\text{exercised}}}}{{{\text{Total}}\,{\text{number}}\,{\text{of}}\,{\text{decision}}\,{\text{outcomes}}}} \times 100$$
(2)

4 Particle Swarm Optimization

PSO is an evolutionary process derived on the basis of social behavior of birds migrating to a destination that is currently unknown [1]. Each bird in the swarm is called a particle and for each particle we need to optimize its position in the swarm. In PSO every particle in the swarm has a position and velocity associated with it in an n-dimensional space. Every particle in the swarm strives for two best values, the first one called as pBest that is the best fitness value retrieved by particle so far and another is called gBest that is the best fitness value achieved so far amongst all the particles.

In PSO the size of the swarm population is denoted as S, S ∈ R (S belongs to set of real number) and the position and velocity vectors are defined as follow: Xi = (xi1, xi2, xi3 … xik) and Vi = (vi1, vi2, vi3 … vik). On the basis of position of the particle in the swarm the fitness is calculated using fitness function and is denoted as Fi = (fi1, fi2, fi3 …. fik). The fittest particle found at a given point of time is denoted as Fg = (fg1, fg2, fg3 …. fgk).

The typical scenario in PSO is represented in Fig. 2 for two particles i and j that fly in the search space to reach a gBest. To understand the movement of a particle in swarm towards gBest lets understand how particle i move in search space. The particle i have an initial position and velocity denoted by Xn and Vn. It is moving towards the gBest by updating its position to Xn+1 and the resultant velocity to Vn+1. The best position of the particle i attained so far is maintained in variable pBest.

Fig. 2
figure 2

The typical movement of two particles X (i)n and X (j)n particles in a swarm is depicted that changes its position and velocity to reach the global best value

A pseudo code for a generic PSO algorithm is depicted as follow. The algorithm begins by initializing the swarm population with velocity, position and fitness value. After the population is initialized the particle’s velocity and position is updated for a number of iterations. The algorithm checks and updates the pBest of particle if it is greater than its current fitness value. Similarly if pBest is less than gBest the algorithm updates gBest value with pBest value of current particle.

figure a

4.1 Mapping PSO to Test Suite Prioritization

In this work, we have used PSO for prioritizing the test cases to save testing efforts during regression testing. To understand how PSO fits in Test Suite Prioritization (TSP) problem; in this section mapping of PSO to TSP is explained. In the test suite the test cases are equivalent to particles in the swarm. For every test case we calculate initial velocity and position using formula’s defined in Eqs. (3) and (4) respectively and the initial velocity of the test case also incorporates the average value of the test data. The aim is to reduce the cost associated with the test case that is equivalent to the fitness value in generic PSO algorithm specified in pseudo code Generic PSO Algorithm. The initial fitness value for the test case is calculated using formula specified in Eq. (5) that takes position of the test case and dimension as an input.

$${\text{v}}_{\text{ik}} = \left( {{\text{maxX}} - {\text{minX}}} \right) * {\text{rand}} + {\text{minX}} + \left( {\text{mean}} \right)$$
(3)
$${\text{x}}_{\text{ik}} = \left( {{\text{maxX}} - {\text{minX}}} \right) * {\text{rand}} + {\text{minX}} + \left( {\text{mean}} \right) + {\text{v}}_{\text{ik}}$$
(4)
$${\text{f}}_{\text{ik}} = {\text{CostFunction}}\left( {{\text{x}}_{\text{ik}} ,\,{\text{dimension}}} \right)$$
(5)

The process to find candidate solution is repeated for a set of iterations and during each iteration particle updates its velocity and position on the basis of formula given in Eqs. (6) and (7) [7] respectively.

$${\text{V}}_{\text{ik}} + 1 = {\text{w}}.{\text{V}}_{\text{ik}} + {\text{c}}_{1} .{\text{rand}}.\left( {{\text{f}}_{\text{ik}} - {\text{x}}_{\text{ik}} } \right) + {\text{c}}_{2} .{\text{rand}}.\left( {{\text{f}}_{\text{gk}} - {\text{x}}_{\text{ik}} } \right) + \left( {\text{mean}} \right)$$
(6)

where

Vik :

velocity of particle i at iteration

w:

weighing function

c1, c2 :

weighing factors

fik :

particles current best position

xik :

particles current position

fgk :

global best

rand:

random number in range [0, 1]

$${\text{X}}_{{{\text{ik}} + 1}} = {\text{x}}_{\text{ik}} + {\text{v}}_{{{\text{ik}} + 1}}$$
(7)

The value of weighing function, w is calculated using formula given in Eq. (8). The weighing factor suggest how well the particle is moving towards the best solution.

$${\text{w}} = {\text{minX}} - \left( {{\text{t}}/{\text{MaxGeneration}}} \right) * \left( {{\text{maxX}} - {\text{minX}}} \right)$$
(8)

where

minx:

Initial weight

maxX:

Final weight

MaxGeneration:

Total number of iterations

t:

represents current iteration.

The parameters and variables that are used in implementing PSO for test suite prioritization are listed in Table 1.

Table 1 PSO parameters

4.2 Fitness Calculation in PSO

The fitness or the cost value of a particle in PSO can be calculated using different single or multi-objective functions. The objective function is also known as benchmark function and is used for evaluating the effectiveness of optimization problems based on factors such as performance, convergence rate, precision and robustness.

In this work, the BFs are used to calculate the cost value associated with the test case based on the position and velocity of test case in the test suite. In this paper, we have used three single-objective functions as shown in Table 2 [11, 24] to calculate the fitness of the test cases using formula specified in Eq. (5). The performance of the three functions is compared with each other in terms of time to identify which work best with PSO for prioritizing the test cases.

Table 2 Benchmark functions

In this work, the BFs are used to calculate the cost value associated with the test case based on the position and velocity of test case in the test suite. In this paper, we have used three single-objective functions as shown in Table 2 [11, 24] to calculate the fitness of the test cases using formula specified in Eq. (6). The performance of the three functions is compared with each other in terms of time to identify which work best with PSO for prioritizing the test cases.

4.3 Proposed Algorithm

The proposed algorithm for prioritizing test cases using PSO is given in pseudo code as follow [25, 26]. The process of prioritizing test cases begins by identifying independent paths in program by drawing CFG and DD-graph. For the identified paths the test cases are generated based on decision nodes in the code that guarantees 100% BC and SC [7].

figure b

The algorithm begins by initializing velocity, position and fitness using formula 3, 4, and 5 for each test case on the path. The initial value for pBest position and pBest cost of the test case is same as that of the test case position and cost. At step 11, the counter iterates for a pre-defined number of iteration. For every iteration the velocity and position of the test cases are updated using formula 6 and 7. The algorithm verifies if pBest fitness value of the test case is greater than current fitness value of test case; the pBest is updated with current fitness of test case. Similarly, the algorithm validates for fitness value of gBest and pBest; and updates gBest with pBest fitness of the test case. The algorithm also incorporates the dimension variable while iterating the test cases.

4.4 Flow Chart of Proposed Algorithm

The flow chart for the algorithm proposed in this work is shown in Fig. 3. The algorithm begins by initializing the test cases or swarm population with random values. The global best is assigned a maximum value and aim is to minimize the value for gBest after performing a set of iterations. The result achieved at the end of the algorithm is a set of prioritized test cases sorted on the basis of increasing order of pBest.

Fig. 3
figure 3

Flow chart for PSO prioritization

5 Analysis and Evaluation

This section includes the steps of data collection followed by implementing PSO for a case study to derive results and showcase how the test cases are prioritized on the basis of fitness value using different BFs.

5.1 Data Collection

In this work, we have implemented PSO for a set of fifteen JAVA programs using Eclipse as an Integrated Development Environment (IDE). The test programs along with their number of input and their independent paths are listed in Table 3. The PSO algorithm is performed for twenty iterations for every test path and the dimension value used is 1.

Table 3 Test programs

5.2 Case Study: Calculator

In this work, calculator program is used as a case study for understanding PSO implementation and the code for which is shown in Fig. 4. The program has 32 Line of Code (LOC) and expects three inputs from the user; input 1 specifies the operation and input 2 and input 3 are integers on which operation needs to be performed. The DD-graph is drawn as shown in Fig. 5 that serves the basis of generating the test cases based on branch nodes.

Fig. 4
figure 4

Calculator program implemented in JAVA

Fig. 5
figure 5

DD-graph for program calculator.java

In Table 4 the independent paths for the DD-graph drawn in Fig. 5 are shown. For every identified path in Table 4 a set of 50 test cases are generated using decision nodes as boundary criteria while generating test cases.

Table 4 Independent paths for calculator.java

In Table 5 two test cases for each path based on the boundary condition is shown along with the average value for the test data is given. The mean value is valued for initializing the velocity of the test case using Eq. (3). The values for input 2 and input 3 are in range of [0, 100].

Table 5 Test data for the independent paths

5.3 Empirical Results

The PSO algorithm specified in Fig. 4 is run on all the independent paths of Table 4. The algorithm takes mean value of the test data generated for calculating the velocity, position and fitness of the test cases. At the end of the algorithm we get test eighteen prioritized test suite that includes three test suites for each path based on three BFs applied on a test suite. The results drawn are not repeatable since calculations for PSO variables uses random numbers. The time taken for prioritization may vary depending on the test environment and machine configuration.

In Table 6 the prioritized test cases for six paths in the calculator program using PSO for three BFs is shown. The results show top ten prioritized test cases amongst the fifty test cases for each path. The gBest value achieved for all the paths using the BFs is listed in the table and the gBest for each path matches the pBest value of the first prioritized test case. The total time taken for prioritizing all the test paths using Sphere, Rastrigin and Griewank is summarized in the last row of the Table.

Table 6 Test suite prioritized for calculator program using PSO

6 Results and Interpretation

In order to prove the effectiveness of PSO with the specified BFs we ran PSO on fifteen JAVA programs under same test environment to produce unbiased results. In Table 7 we have listed the results derived during this work in terms of the time taken by the algorithm and the global best value reached for every independent path. For implementation simplicity we have shown the value for global best value up to five decimal places for all the identified paths in the program. The last column of the table i.e., the time taken for prioritization is calculated by addition of time taken for prioritizing each independent path for the program. The least time taken by PSO amongst the three functions is highlighted as bold in the table and where a path doesn’t exist the entry is shown as X.

Table 7 Results (Time taken by the algorithm) and the global best value for every independent path

The time taken by Griewank is approximately 8.90% of the average time taken by Sphere function and 64.17% of the average time taken by Rastrigin function. The results prove that Griewank works best amongst the three functions with PSO by taking into consideration the execution time and value of gBest achieved in comparison to the optimal value of the benchmark functions as specified in Table 2.

The bar graph for the results driven in Table 7 is shown in Fig. 6. In the graph x-axis represents the 15 programs that have been taken in this work and y-axis represent the mean time taken by PSO for prioritizing test cases for the selected programs using three functions: Sphere, Rastrigin and Griewank. The results drawn are not repeatable as time may vary based on system configuration and the programming language on which the experiment is performed.

Fig. 6
figure 6

Time taken by benchmark functions with PSO

7 Threats to Validity

This empirical investigation is carried out on Eclipse that may not be the correct representative for all the software’s available in the market but we have tried to the best of our ability to generalize our results for different scenarios. For making our results unbiased and generalized we have followed the basic concepts of object-oriented architecture while implementing fifteen program in JAVA. The results presented in this empirical investigation would not be complete without the discussion of threats of validity: External, Internal and Constraint.

External Validity means the degree of to which the results can be generalized and includes the factor that impact our ability for generalizing the results. The main treat to external validity in this work is that the test programs are small and medium with similar fault patterns and that might not truly representing large scope of programs.

Internal Validity is defined as the degree to measure the consequences on change of independent variables on our dependent variables. In this work, we have minimized this effect by selecting the range of random number between [0–1] for calculation of PSO variables.

Constraint Validity is the degree to which the results are appropriately captured using independent and dependent variables. The way in which random numbers generated for calculating PSO variables may vary depending on the framework used for JAVA programming. In this work, we have tried to minimize this threat using eclipse for JAVA development that is widely used in many organizations for large and complex program.

8 Conclusion and Future Work

In this work, we have used three BFs to calculate fitness of test cases and then sorted test suite based on fitness value. We have verified the effectiveness of Griewank function with PSO in terms of global best value achieved and time required for prioritization of test cases. The results are derived by running fifteen java programs with three BFs and it is proved Griewank works best with PSO in terms of average time required for prioritizing test cases.

In future we plan to run the algorithm for programs with 1000 and more LOC and verify the results under the same test environment. The effectiveness of the approach suggested could we further improve using more BFs or either by combining one or more BFs while calculating fitness of particles in PSO algorithm. The algorithm can also be blended with other meta-heuristic algorithm like GA, MA and ACO. A hybrid PSO can be designed that uses a combination of the three functions for test suite prioritization.