GPGPU test suite minimisation: search based software engineering performance improvement using graphics cards

Yoo, Shin; Harman, Mark; Ur, Shmuel

doi:10.1007/s10664-013-9247-y

GPGPU test suite minimisation: search based software engineering performance improvement using graphics cards

Published: 17 March 2013

Volume 18, pages 550–593, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Empirical Software Engineering Aims and scope Submit manuscript

GPGPU test suite minimisation: search based software engineering performance improvement using graphics cards

Download PDF

Shin Yoo¹,
Mark Harman¹ &
Shmuel Ur²

694 Accesses
29 Citations
Explore all metrics

Abstract

It has often been claimed that SBSE uses so-called ‘embarrassingly parallel’ algorithms that will imbue SBSE applications with easy routes to dramatic performance improvements. However, despite recent advances in multicore computation, this claim remains largely theoretical; there are few reports of performance improvements using multicore SBSE. This paper shows how inexpensive General Purpose computing on Graphical Processing Units (GPGPU) can be used to massively parallelise suitably adapted SBSE algorithms, thereby making progress towards cheap, easy and useful SBSE parallelism. The paper presents results for three different algorithms: NSGA2, SPEA2, and the Two Archive Evolutionary Algorithm, all three of which are adapted for multi-objective regression test selection and minimization. The results show that all three algorithms achieved performance improvements up to 25 times, using widely available standard GPUs. We also found that the speed-up was observed to be statistically strongly correlated to the size of the problem instance; as the problem gets harder the performance improvements also get better.

Parallel Multi-Objective Genetic Algorithm

Why GPGPUs for Evolutionary Computation?

Implementation Techniques for Massively Parallel Multi-objective Optimization

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Search Based Software Engineering (SBSE) is a promising sub-area within Software Engineering that reformulates Software Engineering problems as search-based optimisation problems (Clark et al. 2003; Harman 2007; Harman and Jones 2001). There has been much recent interest in SBSE, with over 800 papers on the topic and several recent surveys (Ali et al. 2010; Afzal et al. 2009; Räihä 2009; Harman et al. 2012).

One weakness shared by many SBSE techniques is their significant execution time. It is an inherent drawback because many meta-heuristic optimisation algorithms are designed in such a way that they evaluate a large number of potential candidates to arrive at the set of proposed solutions. This process is often computationally demanding and can render an SBSE technique infeasible in practice, because it would simply require too much time to optimise for complex real-world SE problems. Lack of scalability has been shown to be an important barrier to wider uptake of Software Engineering research (Cordy 2003; Chau and Tam 1997; Premkumar and Potter 1995).

Fortunately, many of the algorithms used in SBSE, such as the most widely used evolutionary algorithms (Harman et al. 2012), are classified as ‘embarrassingly parallel’ due to their inherent potential for parallelism. The high computational requirement created by the necessity to examine many solutions does not necessitate long elapsed time as the examinations can be done in parallel. The computation process (fitness evaluation) for each candidate solution is identical, thereby making the overall process well-suited to Single Instruction Multiple Data (SIMD) parallelism.

The use of multicore computing is rapidly becoming commonplace, with very widely available and inexpensive platforms that offer several hundreds of processing elements that implement SIMD parallelism. Furthermore, it is likely that we shall see significant advances in such platforms in the near future, with thousands and perhaps tens of thousands of simple processing elements becoming available within the reach of ‘standard’ desktop computation.

Many SBSE fitness functions are ideally-suited to such simple processing elements. However, there has been little work on multicore SBSE. The first authors to suggest the use of multicore SBSE were Mitchell et al. (2001) who used a distributed architecture to parallelise modularisation through the application of search-based clustering. Subsequently, Mahdavi et al. (2003) used a cluster of standard PCs to implement a parallel hill climbing algorithm. More recently, Asadi et al. (2010) used a distributed architecture to parallelise a genetic algorithm for the concept location problem. However, hitherto, no authors^{Footnote 1} have used General Purpose computing on Graphical Processing Units (GPGPU) for SBSE.

In this paper we propose GPGPU SBSE; the use of GPGPU devices to achieve multicore execution of SBSE, using simple fitness computations mapped across the multiple processing elements of standard GPGPU architectures. We report results for the application of GPGPU SBSE to the multi-objective regression test selection problem (also known as test case minimization).

In order to apply GPGPU to Software Engineering optimisation problems, the specialized graphics manipulation hardware needs to be harnessed for a purpose for which it was not originally designed. Fortunately, the recent trend towards ‘general purpose’ graphics processing units lends itself well to this task. The underlying firmware typically implements certain matrix manipulation operations very efficiently. The key to unlocking the potential of GPGPU therefore lies in the ability to reformulate the optimisation problem in terms of these specific matrix manipulation operations.

This paper focusses on the problem of Search Based Regression Testing, which is one problem in the general area of Search Based Software Testing. Regression Testing is concerned with the process of re–testing software after change. After each change to the system, the pool of available test data needs to be re-executed in order to check whether change has introduced new faults. Regression Testing therefore seeks to answer the question ‘has the software regressed?’. There have been several survey papers on Regression Testing applications and techniques that provide a more detailed treatment (Harrold and Orso 2008; Yoo and Harman 2012a; Engström et al. 2009).

In search based regression testing, the goal is to use search based optimisation algorithms to find optimal sets of test cases (regression test suite minimisation Yoo and Harman 2007) or to order test cases for regression testing (regression test prioritisation Walcott et al. 2006; Li et al. 2007). This paper concentrates upon the former problem of regression test minimisation. Recent results have shown that this is a promising area of SBSE application; the results obtained from the SBSE algorithms have been shown to be human competitive (de Souza et al. 2010).

Fast regression test minimisation is an important problem for practical software testers, particularly where large volumes of testing are required on a tight build schedule. For instance, the IBM middleware product used as one of the systems in the empirical study in this paper is a case in point. While it takes over four hours to execute the entire test suite for this system, the typical smoke test scenario performed after each code submit is assigned only an hour or less of testing time, forcing the tester to select a subset of tests from the available pool. A multi-objective approach to test suite minimisation (Yoo and Harman 2007) provides an ideal solution as it can recommend subsets of tests that can be executed within different time budgets. However, as the selection of tests in the smoke tests is not static and depends on the code submitted, the given time budget should account for both the computation involved in test suite minimisation and for running the tests. Therefore it is important that test suite optimization will be done in a small fraction of the time, thereby allowing sophisticated minimisation to be used on standard machines.

The paper modifies three widely used evolutionary algorithms (SPEA2, NSGA2 and the Two Archive Algorithm) for the multi-objective regression test minimisation problem. The algorithms are modified to support implementation on a GPU by transforming the fitness evaluation of the population of individual solutions into a matrix-multiplication problem, which is inherently parallel and renders itself very favourably to the GPGPU approach. This transformation to matrix-multiplication is entirely straightforward and may well be applicable to other SBSE problems, allowing them to benefit from similar scale-ups to those reported in this paper.

The modified algorithms have been implemented using OpenCL technology, a framework for GPGPU. The paper reports the results of the application of the parallelised GPGPU algorithms on 13 real-world programs, including widely studied, but relatively small examples from the Siemens’ suite (Hutchins et al. 1994), through larger more realistic real-world examples from the Software-Infrastructure Repository (SIR) for testing (Do et al. 2005), and on a very large IBM middleware regression testing problem.

The primary contributions of the paper are as follows:

1.
The paper presents results for real-world instances of the multi-objective test suite minimisation problem. The results indicate that dramatic speed-up is achievable. For the systems used in the empirical study, speed-ups over 25× were observed. The empirical evidence suggests that, for larger problems where the scale-up is the most needed, the degree of speed-up is the most dramatic; a problem that takes over an hour using conventional techniques, can be solved in minutes using the GPGPU approach. This has important practical ramifications because regression testing cycles are often compressed: overnight build cycles are not uncommon.
2.
The paper studies three different multi-objective evolutionary algorithms based on both GPU- and CPU-based parallelisation methods to provide robust empirical evidence for the scalability conferred by the use of GPGPU. The GPGPU parallelisation technique maintained the same level of speed-up across all algorithms studied. The empirical evidence highlights the limitations of CPU-based parallelisation: with smaller problems, multi threading overheads erode the speed-up, whereas with larger problems it fails to scale as well as GPU-based parallelisation.
3.
The paper explores the factors that influence the degree of speed-up achieved, revealing that both program size and test suite size are closely correlated to the degree of speed-up achieved. The data have a good fit to a model for which increases in the degree of scale-up achieved are logarithmic in both program and test suite size.

The rest of the paper is organised as follows. Section 2 presents background material on test suite minimisation and GPGPU-based evolutionary computation. Section 3 describes how the test suite minimisation problem is re-formulated for a parallel algorithm, which is described in detail in Section 4. Section 5 describes the details of the empirical study, the results of which are analysed in Section 6. Section 7 discusses threats to validity and Section 8 presents the related work. Section 9 concludes.

2 Background

Multi-objective Test Suite Minimisation

The need for test suite minimisation arises when the regression test suite of an existing software system grows to such an extent that it may no longer be feasible to execute the entire test suite (Rothermel et al. 2002b). In order to reduce the size of the test suite, any redundant test cases in the test suite need to be identified and removed. One widely accepted criterion for redundancy is defined in relation to the coverage achieved by test cases (Rothermel et al. 1998; Black et al. 2004). If the test coverage achieved by test case t ₁ is a subset of the test coverage achieved by test case t ₂, it can be said that the execution of t ₁ is redundant as long as t ₂ is also executed. The aim of test suite minimisation is to obtain the smallest subset of test cases that are not redundant with respect to a set of test requirements. More formally, test suite minimisation problem can be defined as follows (Yoo and Harman 2012a):

2.1 Test Suite Minimisation Problem

Given

A test suite of m tests, T, a set of l test requirements {r ₁, ..., r _l}, that must be satisfied to provide the desired ‘adequate’ testing of the program, and subsets of T, T _is, one associated with each of the r _is such that any one of the test cases t _j belonging to T _i can be used to achieve requirement r _i.

Problem

Find a minimal representative set,^{Footnote 2} T′, of test cases from T that satisfies all r _is.

The testing criterion is satisfied when every test-case requirement in {r ₁, ..., r _l} is satisfied. A test-case requirement, r _i, is satisfied by any test case, t _j, that belongs to T _i, a subset of T. If we represent test cases as vertices of a bipartite graph on the left side, and the requirements on the right side, and the satisfiability relationship as edges between two sides, the minimal representative set of test cases is the hitting set of T _is (i.e. the subset of vertices on the left, the union of whose connected right side vertices equals the set of all requirements). Furthermore, in order to maximise the effect of minimisation, T′ should be the minimal hitting set of T _is. The minimal representative set problem is an NP-complete problem as is the dual problem of the minimal set cover problem (Garey and Johnson 1979).

The NP-hardness of the problem encouraged the use of heuristics and meta-heuristics. The greedy approach (Offutt et al. 1995) as well as other heuristics for minimal hitting set and set cover problem (Harrold et al. 1993; Chen and Lau 1995) have been applied to test suite minimisation but these approaches were not cost-cognisant and only dealt with a single objective (test coverage). With the single-objective problem formulation, the solution to the test suite minimisation problem is one subset of test cases that maximises the test coverage with minimum redundancy.

Later, the problem was reformulated as a multi-objective optimisation problem (Yoo and Harman 2007). With the multi-objective problem formulation, the solution to the test suite minimisation problem is not just a single solution but a set of non-dominated, Pareto-efficient solutions. This set of solutions reveals the trade-off between test coverage and the cost of testing that is specific to the test suite in consideration. For example, with the solution to the multi-objective test suite minimisation problem, it is possible not only to know what the minimal subset that achieves the maximum test coverage is, but also to know how much test coverage is possible for any given testing budget.

Since the greedy algorithm may not always produce Pareto optimal solutions for multi-objective test suite minimisation problems, Multi-Objective Evolutionary Algorithms (MOEAs) have been applied (Maia et al. 2009; Yoo and Harman 2007). While this paper studies three selected MOEAs, the principle of parallelising fitness evaluation of multiple solutions in the population of an MOEA applies universally to any MOEA.

GPGPU and Evolutionary Algorithms

Graphics cards have become a compelling platform for intensive computation, with a set of resource-hungry graphic manipulation problems that have driven the rapid advances in their performance and programmability (Owens et al. 2007). As a result, consumer-level graphics cards boast tremendous memory bandwidth and computational power. For example, ATI Radeon HD4850 (the graphics card used in the empirical study in the paper), costing about $150 as of April 2010, provides 1000GFlops processing rate and 63.6 GB/s memory bandwidth. Graphics cards are also becoming faster more quickly compared to CPUs. In general, it has been reported that the computational capabilities of graphics cards, measured by metrics of graphics performance, have compounded at the average yearly rate of 1.7× (rendered pixels/s) to 2.3× (rendered vertices/s) (Owens et al. 2007). This significantly outperforms the growth in traditional microprocessors; using the SPEC benchmark, the yearly rate of growth for CPU performance has been measured at 1.4× by a recent survey (Ekman et al. 2005).

The disparity between the two platforms is caused by the different architecture. CPUs are optimised for executing sequential code, whereas GPUs are optimised for executing the same instruction (the graphics shader) with data parallelism (different objects on the screen). This Single Instruction Multiple Data (SIMD) architecture facilitates hardware-controlled massive data parallelism, which results in the higher performance for certain types of problems in which a large dataset has to be submitted to the same operations.

It is precisely this massive data-parallelism of General-Purpose computing on Graphics Processing Units (GPGPU) that makes GPGPU as an ideal platform for parallel evolutionary algorithms. Many of these algorithms require the calculation of fitness (single instruction) for multiple individual solutions in the population pool (multiple data). Early work has exploited this potential for parallelism with both single- and multi-objective evolutionary algorithms (Tsutsui and Fujimoto 2009; Wilson and Banzhaf 2009; Wong 2009). However, most existing evaluation has been performed on benchmark problems rather than practical applications.

3 Parallel Formulation of MOEA for Test Suite Minimisation

Parallel Fitness Evaluation

The paper considers, for parallelisation, a multi-objective test suite minimisation problem from existing work (Yoo and Harman 2007). In order to parallelise test suite minimisation, the fitness evaluation of a generation of individual solutions for the test suite minimisation problem is re-formulated as a matrix multiplication problem. Instead of computing the two objectives (i.e. coverage of test requirements and execution cost) for each individual solution, the solutions in the entire population are represented as a matrix, which in turn is multiplied by another matrix that represents the trace data of the entire test suite. The result is a matrix that contains information for both test goal coverage and execution cost. While the paper considers structural coverage as the test goal, the proposed approach is equally applicable to any other testing criteria, either coverage generated such as data-flow coverage and functional coverage or even those generated manually, provided that there is a clear mapping between tests and the test requirements they achieve.

More formally, let matrix A contain the trace data that capture the test requirements achieved by each test; the number of rows of A equals the number of test requirements to be covered, l, and the number of columns of A equals the number of test cases in the test suite, m. Entry a _i,j of A stores 1 if the test goal f _i was executed (i.e. covered) by test case t _j, 0 otherwise.

$$ A = \left(\begin{array}{ccc} a_{1,1} & \ldots & a_{1,m}\\ a_{2,1} & \ldots & a_{2,m}\\ & \ldots & \\ a_{l,1} & \ldots & a_{l,m}\\ \end{array}\right) $$

The multiplier matrix, B, is a representation of the current population of individual solutions that are being considered by a given MOEA. Let B be an m-by-n matrix, where n is the size of population for the given MOEA. Entry b _j,k of B stores 1 if test case t _j is selected by the individual p _k, 0 otherwise. In other words, each column in matrix B corresponds to a vector of decision variables that denote the selected test cases.

$$ B = \left(\begin{array}{ccc} b_{1,1} & \ldots & b_{1,n}\\ b_{2,1} & \ldots & b_{2,n}\\ & \ldots & \\ b_{m,1} & \ldots & b_{m,n}\\ \end{array}\right) $$

The fitness evaluation of the entire generation is performed by the matrix multiplication of C = A ×B. Matrix C is a l-by-n matrix; entry c _i,k of C denotes the number of times test goal f _i was covered by different test cases that had been selected by the individual p _k.

Cost and Coverage

In order to incorporate the execution cost as an additional objective to the MOEA, the basic reformulation is extended with an extra row in matrix A. The new matrix, A′, is an l + 1 by m matrix that contains the cost of each individual test case in the last row. The extra row in A′ results in an additional row in C′ which equals to A′ ×B as follows:

$$A' = \left(\begin{array}{ccc} a_{1,1} & \ldots & a_{1,m}\\ a_{2,1} & \ldots & a_{2,m}\\ & \ldots & \\ a_{l,1} & \ldots & a_{l,m}\\ \mathrm{cost}(t_1) & \ldots & \mathrm{cost}(t_m)\\ \end{array}\right) \qquad C' = \left(\begin{array}{ccc} c_{1,1} & \ldots & c_{1,n}\\ c_{2,1} & \ldots & c_{2,n}\\ & \ldots & \\ c_{l,1} & \ldots & c_{l,n}\\ \mathrm{cost}(p_1) & \ldots & \mathrm{cost}(p_n)\\ \end{array}\right) $$

By definition, an entry c _l + 1,k in the last row in C′ is defined as $c_{l + 1,k} = \sum_{j=1}^{m}{a_{l+1, j} \cdot b_{\!j,k}} = \sum_{j=1}^{m}{\mathrm{cost}(t_j) \cdot b_{\!j,k}}$. That is, c _{l + 1, k} equals the sum of costs of all test cases selected (b _j,k equals 1) by the k-th individual solution p _k, i.e. cost(p _k). Similarly, after the multiplication, the k-th column of matrix C′ contains the coverage of test requirements achieved by individual solution p _k. However, this information needs to be summarised into a percentage coverage, using a step function f as follows: $\mathrm{coverage}(p_k) = \frac{\sum_{i=1}^{m}{f(c_{i,k})}}{l}$, f(x) = 1 (x > 0) or 0 (otherwise). The role of the step function is to translate the linear sum of how many times a test goal has been covered into boolean coverage of whether it was covered or not.

The cost objective can be calculated as a part of the matrix multiplication because it can be linearly computed from the decision variable vectors (columns of matrix B). Other objectives that share the linearity may also be computed using matrix multiplication. However, the coverage of test requirements requires a separate step to be performed. Each column of C′ contains the number of times individual testing requirements were covered by the corresponding solution; in order to calculate the coverage metric for a solution, it is required to iterate over the corresponding column of C′. The coverage calculation is highly parallel in nature because each column can be independently iterated over and, therefore, can take the advantage of GPGPU architecture by running multiple threads.

4 Algorithms

This section presents the parallel fitness evaluation components for CPU and GPU and introduces the MOEAs that are used in the paper.

Parallel Matrix Multiplication Algorithm

Matrix multiplication is inherently parallelisable as the calculation for an individual entry of the product matrix does not depend on the calculation of any other entry. Algorithm 1 shows the pseudo-code of the parallel matrix multiplication algorithm using the matrix notation in Section 3.

Algorithm 1 uses one thread per element of matrix C′, resulting in a total of (l + 1)·n threads. Each thread is identified with unique thread id, tid. Given a thread id, Algorithm 1 calculates the corresponding element of the resulting matrix, $C^\prime_{y,x}$ given the width of matrix A, w _A, i.e. $y = \frac{tid}{w_B}$ and $x = tid \mod w_B$.

Coverage Collection Algorithm

After matrix-multiplication using Algorithm 1, coverage information is collected using a separate algorithm, pseudo-code of which is shown in Algorithm 2. Unlike Algorithm 1, the coverage collection algorithm only requires n threads, i.e. one thread per column in matrix C ^′.

The loop in Line (2) and (3) counts the number of structural elements that have been executed by the individual solution p _tid. The coverage is calculated by dividing this number by the total number of structural elements that need to be covered.

While coverage information requires a separate collection phase, the sum of costs for each individual solution has been calculated by Algorithm 1 as a part of the matrix multiplication following the extension in Section 3.

5 Experimental Setup

5.1 Research Questions

This section presents the research questions studied in the paper. RQ1 and RQ2 concern the scalability achieved by the speed-up through the use of GPGPU, whereas RQ3 concerns the practical implications of the speed-up and the consequent scalability to the practitioners.

RQ1. Speed-up

what is the speed-up factor of GPU- and CPU-based parallel versions of MOEAs over the untreated CPU-based version of the same algorithms for multi-objective test suite minimisation problem?

RQ2. Correlation

what are the factors of the problem instances that have the highest correlation to the speed-up achieved, and what is the correlation between these factors and the resulting speed-up?

RQ3. Insight

what are the realistic benefits of the scalability that is achieved by the GPGPU approach to software engineers?

RQ1 is answered by observing the dynamic execution time of the parallel versions of the studied algorithms as well as the untreated single-threaded algorithms. For RQ2, two factors constitute the size of test suite minimisation problem: the number of test cases in the test suite and the number of test requirements in System Under Test (SUT) that need to be covered. The speed-up values measured for RQ1 are statistically analysed to investigate the correlation between the speed-up and these two size factors. RQ3 is answered by analysing the result of test suite minimisation obtained for a real-world testing problem.

5.2 Subjects

Table 1 shows the subject programs for the empirical study. Twelve of the programs and test suites are from the Software Infrastructure Repository (SIR) (Do et al. 2005). In order to obtain test suites with varying sizes ranging from a few hundred to a few thousand test cases, the study includes multiple test suites for some subject programs. For printtokens and schedule, smaller test suites are coverage-adequate test suites, whereas larger test suites include all the available test cases. To avoid selection bias, four smaller test suites were randomly selected from the pool of available tests for each program. In the case of space, SIR contains multiple coverage-adequate test suites of similar sizes; four test suites were selected randomly.

Table 1 Subject programs used for the empirical study

Full size table

The subjects also include a large system-level test suite from IBM. For this subject, the coverage information was maintained at the function level. The test suite contains only 181 test cases, but these test cases are used to cover 61,770 functions in the system.

Each test suite has an associated execution cost dataset. For the subject programs from SIR, the execution costs were measured by observing the number of instructions required by the execution of tests. This was performed using a well-known profiling tool, valgrind (Nethercote and Seward 2007), which executes the given program on a virtual processor. For ibm, physical wall-clock time data, measured in seconds, were provided by IBM. The entire test suite for ibm takes more than 4 h to execute.

5.3 Implementation and Hardware

Implementation

The paper uses the open source Java MOEA library, jMetal (Durillo et al. 2006, 2010) as a library of untreated versions of MOEAs: NSGA-II and SPEA2 are included in the jMetal library; The Two Archive algorithm has been implemented using the infrastructure provided by the library. The untreated versions of MOEAs evaluate the fitness of individual solutions in the population one at a time, which incurs method invocations regarding the retrieval of coverage and cost information.

The GPGPU-based parallel versions of these three algorithms are implemented in the OpenCL GPGPU framework using a Java wrapper called JavaCL (Chafik 2009). The CPU-based parallel versions of the three algorithms use a parallel programming library for Java called JOMP (Bull et al. 2000). JOMP allows parameterised configuration of the number of threads to use. The parallelisation is only applied to the fitness evaluation step because it is not clear whether certain steps in the studied algorithms, such as sorting, may yield sufficient efficiency when performed in parallel.

All three algorithms are configured with population size of 256 following the standard recommendation to set the number of threads to multiples of 32 or 64 (ATI Stream Computing 1010). The archive size for SPEA2 and The Two Archive algorithm is also set to 256. The stopping criterion for all three algorithms is to reach the maximum number of fitness evaluations, which is set to 64,000, allowing 250 generations to be evaluated.

All three algorithms solve the test suite minimisation problem by selecting Pareto-optimal subsets of test cases, represented by binary strings that form columns in matrix B in Section 3. The initial population is generated by randomly setting the individual bits of these binary strings so that the initial solutions are randomly distributed in the phenotype space.

NSGA-II and SPEA2 use the binary tournament selection operator. The Two Archive algorithm uses the uniform selection operator as described in the original paper (Praditwong and Yao 2006): the selection operator first selects one of the two archives with equal probability and then selects one solution from the chosen archive with uniform probability distribution. All three algorithms use the single-point crossover operator with probability of crossover set to 0.9 and the single bit-flip mutation operator with the mutation rate of $\frac{1}{n}$ where n is the length of the bit-string (i.e. the number of test requirements).

Hardware

All algorithms have been evaluated on a machine with a quad-core Intel Core i7 CPU (2.8 GHz clock speed) and 4 GB memory, running Mac OS X 10.6.5 with Darwin Kernel 10.6.0 for x86_64 architecture. The Java Virtual Machine used to execute the algorithms is Java SE Runtime with version 1.6.0_22. The GPGPU-based versions of MOEAs have been evaluated on an ATI Radeon HD4850 graphics card with 800 stream processors running at 625 MHz clock speed and 512 MB GDDR3 onboard memory.

5.4 Evaluation

The paper compares three MOEAs, each with five different configurations: the untreated configuration (hereafter refered to CPU), the GPGPU configuration (GPU) and the JOMP-based parallel configurations with 1, 2, and 4 threads (JOMP1/2/4). The configuration with one thread (JOMP1) is included to observe the speed-up achieved by evaluating the fitness of the entire population using matrix multiplication, instead of evaluating the solutions one by one as in the untreated versions of MOEA. Any speed-up achieved by JOMP1 over CPU is, therefore, primarily achieved by the code-level optimisation that removes the method invocation overheads. On the other hand, JOMP1 does incur an additional thread management overhead.

In total, there are 15 different configurations (three algorithms with five configurations each). For each subject test suite, the 15 configurations were executed 30 times in order to cater for the inherent randomness in dynamic execution time, measured using the system clock. The speed-up is calculated by dividing the amount of the time that the CPU configuration required with the amount of the time that parallel configurations required. While we discuss the speed-up values only using the total execution time, we also provide observations of the three parts break-down of the total execution time (Time _total) for each algorithm as below:

Initialisation (Time _init): the time it takes for the algorithm to initialise the test suite data in a usable form; for example, GPU configurations of MOEAs need to transfer the test suite data onto the graphics card.
Fitness Evaluation (Time _fitness): the time it takes for the algorithm to evaluate the fitness values of different generations during its runtime.
Remaining (Time _remaining): the remaining parts of the execution time, most of which is used for archive management, genetic operations, etc.

6 Results

This section presents the speed-up measurements of the single-threaded, CPU-based multi-threaded, and GPGPU-based multi-threaded approaches and analyses the correlation between the speed-up and problem size.

6.1 Speed-up

Figure 1 presents the mean paired speed-up results of all configurations. The mean paired speed-up values were calculated by dividing the execution time of CPU with the corresponding execution time of the parallel configurations for each of the 30 observations. Tables 2, 3 and 4 contain the speed-up data in more detail, whereas the statistical analysis of the raw information can be obtained from Tables 12, 13 and 14 in the Appendix.

Table 2 Speed-up results for NSGA-II

Full size table

Table 3 Speed-up results for SPEA2

Full size table

Table 4 Speed-up results for TAEA

Full size table

Overall, the observed paired mean speed-up ranges from 0.47× to 25.09×. While the different archive management strategies used by each MOEAs make it difficult to compare the execution time results directly (because the different amount of heap used by each may affect JVM’s performance differently), it is possible to observe the general trend that the speed-up tends to increase as the problem size grows. The speed-up values below 1.0 show that the overhead of thread management and the additional communication can be detrimental for the problems of sufficiently small size. However, as the problem size grows, JOMP1 becomes faster than CPU with all algorithms, indicating that the amount of reduced method call overhead eventually becomes greater that the thread management overhead.

With the largest dataset, ibm, the GPU configuration of NSGA-II reduces the average execution time of CPU, 4,347 s (1 h 12 min and 27 s), to the average of 174 s (2 min and 54 s). The speed-up remains consistently above 3.0× for all three algorithms if the problem size is larger than that of flex, i.e. about 400,000 (103 tests × 3,965 test requirements).

To provide inferential statistical analysis of the observed execution time data have been compared using the Mann-Whitney ‘U’ test. The Mann-Whitney ‘U’ test is a non-parametric statistical hypothesis test, i.e. it allows the comparison of two samples with unknown distribution. The execution time data observed with JOMP1/2/4 and GPU configurations were compared to those from CPU configuration. The null hypothesis is that there is no difference between the parallel configurations and CPU configuration; the alternative hypothesis is that the execution time of the parallel configurations is smaller than that of CPU configuration.

Tables 5, 6 and 7 present the resulting p-values. With JOMP1 and JOMP2 configurations, the alternative hypothesis is rejected for 39 and 12 cases at the confidence level of 95 %, respectively, out of of 42 cases with subjects with problem sizes smaller than that of flex, providing evidence that the parallel configurations required more time than the untreated configuration (CPU). With JOMP4 and GPU configurations, the null hypothesis is universally rejected for all subjects with problem sizes larger than that of flex, providing strong evidence that the parallel configurations required less time than the untreated configuration (CPU). The particular results are naturally dependent on the choice of the graphics card that has been used for the experiment. However, these results, taken together, provide strong evidence that, for test suite minimisation problems of realistic sizes, the GPGPU approach can provide a speed-up of at least 3.0×. This finding answers RQ1.

Table 5 Mann-Whitney U test for NSGA-II

Full size table

Table 6 Mann-Whitney U test for SPEA2

Full size table

Table 7 Mann-Whitney U test for TAEA

Full size table

6.2 Correlation

Regarding RQ2, one important factor that contributes to the level of speed-up is the speed of each individual computational unit in the graphics card. The HD4850 graphics card used in the experiment contains 800 stream processor units that are normally used for the computation of geometric shading. Each of these processors executes a single thread of Algorithm 1, of which there exist more than 800. Therefore, if the individual stream processor is as powerful as a single core of the CPU, the absolute upper bound on speed-up would be 800. In practice, the individual processors run with a clock speed of 625 MHz, which makes them much slower and, therefore, less powerful than a CPU core. This results in speed-up values lower than 800.

In order to answer RQ2, statistical regression analysis was performed on the correlation between the observed speed-up and the factors that constitute the size of problems.

Three size factors have been analysed for the statistical regression: size of test goal set, size of test suite and their product. The number of test requirements and the number of test cases are denoted by l and m respectively, following the matrix notation in Section 3: l is proportional to the number of threads the GPGPU-version of the algorithm has to execute (as the size of the matrix C′ is l-by-n and n is fixed); m denotes the amount of computation that needs to be performed by a single thread (as each matrix-multiplication kernel computes a loop with m iterations). In addition to these measurement, another size factor z = l ·m is considered to represent the perceived size of the minimisation problem. Table 8 shows the results of Spearman’s rank correlation analysis between size factors and observed speed-ups.

Table 8 Spearman’s rank correlation coefficients between three size factors and speed-ups

Full size table

Spearman’s rank correlation is a non-parametric measure of how well the relationship between two variables can be described using a monotonic function. As one variable increases, the other variable will tend to increase monotonically if the coefficient is close to 1, whereas it would decrease monotonically if the coefficient is close to −1.

In all algorithms and configurations, the size factor l shows the strongest positive correlation with speed-ups. The correlation coefficients for z are weaker than those for l or, in the case of JOMP1 for the Two Archive algorithm, negative. The correlation for m remains negative for all algorithms and configurations.

To gain further insights into the correlation between size and the speed-up observed, a regression analysis was performed. Factor z is considered in isolation, whereas l and m are considered together; each variable has been considered in its linear form (z, l and m) and logarithmic form (logz, logl and logm). This results in 6 different combinations of regression models. Tables 9, 10 and 11 in the Appendix present the detailed results of regression analysis for the three algorithms respectively.

With a few exceptions of very small margins (NSGA-II with JOMP4 and SPEA2 with JOMP1, JOMP4, and GPU), the model with the highest r ² correlation for all algorithms and configurations is S _p = αlogl + βlogm + γ. Figure 2 shows the 3D plot of this model for the GPU and JOMP4 configuration of the Two Archive algorithm.

The observed trend is that the inclusion of logl results in higher correlation, whereas models that use l in its linear form tend to result in lower correlation. This supports the results of Spearman’s rank correlation analysis in Table 8. The coefficients for the best-fit regression model for GPU, S _p = αlogl + βlogm + γ, can explain why the speed-up results for space test suites are higher than those for test suites with z values such as tcas, gzip and replace. Apart from bash and ibm, space has the largest number of test requirements to cover, i.e. l. Since α is more than twice larger than β, a higher value of l has more impact to S _p than m.

Based on the analysis, RQ2 is answered as follows: the observed speed-up shows a strong linear correlation to the log of the number of test requirements to cover and the log of the number of test cases in the test suite. The positive correlation provides evidence that GPU-based parallelisation scales up.

Furthermore, within the observed data, the speed-up continues to increase as the problem size grows, which suggests that the graphics card did not reach its full computational capacity. It may be that, for larger problems, the speed-up would be even greater than those observed in this paper. The finding that the scalability factor increases with overall problem size is a very encouraging finding; as the problem gets harder, the solution gets better.

6.3 Insights

This section discusses a possible real-world scenario in which the parallelisation of multi-objective test suite minimisation can have a high impact. A smoke test is a testing activity that is usually performed in a very short window of time to detect the most obvious faults, such as system crashes. IBM’s smoke test practice is to allow from 30 to 60 min of time to execute a subset of tests from a large test suite that would require more than 4 h to execute in its entirety.

Using static smoke test suite is problematic as running the same tests at every regression greatly reduces the likelihood of finding bugs. Therefore it is important to recalculate the most relevant smoke test suite given the changes to the code. It is for this reason that the cost of computation, especially the actual time it takes, becomes very important.

Figure 3 shows two possible smoke test scenarios based on the results of CPU and GPGPU configurations of NSGA-II. It is a plot of how much test requirement coverage can be achieved during the given time, including the time needed for multi-objective test suite minimisation. The solid line represents the scenario based on the GPGPU configuration of the algorithm, whereas the dotted line represents the scenario based on the CPU configuration. The beginning flat segment at the bottom shows the time each configuration spends on the optimisation process; the curved segment shows the trade-off between time and test coverage achieved by the optimised test suite. Since the CPU configuration of NSGA-II takes longer than 60 min to terminate, it cannot contribute to any smoke test scenario that must be completed within 60 min. On the other hand, the GPGPU configuration allows the tester to consider a subset of tests that can be executed within 30 min. If the grey region was wider than Fig. 3, the difference between two configurations would have been even more dramatic.

This answers RQ3 as follows: a faster execution of optimisation algorithms enables the tester not only to use the algorithms but also to exploit their results more effectively. This real-world smoke test example from IBM demonstrates that scale-ups accrued from the use of GPGPU are not only sources of efficiency improvement, they can also make possible test activities that are simply impossible without this scalability.

The ability to execute a sophisticated optimisation algorithm within a relatively short time also allows the tester to consider state-of-the-art regression testing techniques with greater flexibility. The greater flexibility is obtained because the cost of the optimisation does not have to be amortised across multiple iterations. Many state-of-the-art regression testing techniques require the use of continuously changing sets of testing data, such as recent fault history (Yoo and Harman 2007) or the last time a specific test case has been executed (Kim and Porter 2002; Engström et al. 2010). In addition to the use of dynamic testing data, the previous work also showed that repeatedly using the same subset of a large test suite may impair the fault detection capability of the regression testing (Yoo et al. 2009).

7 Threats to Validity

Threats to internal validity concern the factors that could have affected the experiments in the paper. While GPGPU architecture has been researched for some time, the commercially available GPGPU frameworks such as CUDA and OpenCL are still in their early stages and, therefore, may contain faults in the implementation.

The GPGPU matrix-multiplication routine has been manually tested and validated with additional data apart from the test suites chosen for the empirical study. Regarding the precision of the GPGPU-based calculation, the aim of the paper is to investigate the potential speed-up that can be gained by using GPGPU, rather than to consider the effectiveness of the actual test suite minimisation in the context of regression testing. Therefore, the precision issue does not constitute a major issue for the aim of this study.

Threats to external validity concern any factor that might prevent the generalisation of the result presented by the paper. Since the performance of GPGPU computing is inherently hardware specific, the results reported in the paper may not be reproducible in their exact form using other combinations of hardware components. However, with the advances in graphics card architecture, it is more likely that experiments with the same approach with newer graphics card will only improve the speed-up results reported in the paper.

It should be also noted that optimising test suite minimisation using evolutionary computation is an inherently ideal candidate for GPGPU computation as the reformulated problem, matrix-multiplication, is highly parallel in nature. Other problems in search-based software engineering may not render themselves as easily as the test suite minimisation problem. However, this issue is inherent in any attempts to parallelise a software engineering technique and not specific to GPGPU approach.

Threats to construct validity arise when measurements used in the experiments do not capture the concepts they are supposed to represent. The speed-up calculation was based on the measurements of execution time for both algorithms using system clock, which was chosen because it represents the speed of a technique to the end-user. Regarding the measurements of problem size used for the regression analysis, there may exist more sophisticated measurements of test suites and program source code that correlates better with the speed-up. However, both the number of test requirements and the number of test cases are probably the most readily available measurements about source code and test suites and provide a reasonable starting point for this type of analysis.

8 Related Work

Recent developments in graphics hardware provide an affordable means of parallelism: not only is the hardware more affordable than that of multiple PCs but also the management cost is much smaller than that required for a cluster of PCs, because it depends on a single hardware component. GPGPU has been successfully applied to various scientific computations (Boyer et al. 2009; Govindaraju et al. 2006), while Langdon and Banzhaf (2008) recently used GPGPU for performance improvements in optimization problems. However, these techniques have not been applied to Search Based Software Engineering problems, motivating the work of this paper, i,e., the use of GPGPU to achieve performance improvements in SBSE.

As a first instance of GPGPU SBSE, we study the SBSE application domain of regression testing. Regression test selection (also known as test suite minimisation) aims to reduce the number of tests to be executed by calculating the minimum set of tests that are required to satisfy the given test requirements. The problem has been formulated as the minimal hitting set problem (Harrold et al. 1993), which is NP-hard (Garey and Johnson 1979).

Various heuristics for the minimal representative set problem, or the minimal set cover problem (the dual of the former), have been suggested for test suite minimisation (Chen and Lau 1996; Offutt et al. 1995). However, empirical evaluations of these techniques have reported conflicting views on the impact on fault detection capability: some reported no impact (Wong et al. 1998, 1999) while others reported compromised fault detection capability (Rothermel et al. 2002a, b).

One potential reason why test suite minimisation has a negative impact on the fault detection capability is the fact that the criterion for minimisation is structural coverage; achieving coverage alone may not be sufficient for revealing faults. This paper uses the multi-objective approach based on Multi-Objective Evolutionary Algorithm (MOEA) introduced by Yoo and Harman (2007); the paper also presents the first attempt to parallelise test suite minimisation with sophisticated criteria for scalability. Multi-objective forms of regression testing problems are increasingly popular in SBSE work, since most real-world regression testing scenarios need to satisfy multiple objectives (Harman 2011). Our SBSE approach to regression testing has also been used at Google (Yoo et al. 2011b).

Population-based evolutionary algorithms are ideal candidates for GPGPU parallelisation (Owens et al. 2007) and existing work has shown successful implementations for classical problems. Tsutsui and Fujimoto implemented a single-objective parallel Genetic Algorithm (GA) using GPU for the Quadratic Assignment Problem (QAP) (Tsutsui and Fujimoto 2009). Wilson and Banzaf implemented a linear Genetic Programming (GP) algorithm on XBox360 game consoles (Wilson and Banzhaf 2009). Langdon and Banzaf implemented GP for GPU using an SIMD interpreter for fitness evaluation (Langdon and Banzhaf 2008). Wong implemented an MOEA on GPU and evaluated the implementation using a suite of benchmark problems (Wong 2009). Wong’s implementation parallelised not only the fitness evaluation step but also the parent selection, crossover & mutation operator as well as the dominance checking.

Despite the highly parallelisable nature of many techniques used in SBSE, few parallel algorithms have been used. Of 763 papers on SBSE (Zhang 2011) only three present results for parallel execution of SBSE. Mitchell et al. used a distributed architecture for their clustering tool Bunch (Mitchell et al. 2001). Mahdavi et al. (2003) used a cluster of standard PCs to implement a parallel hill climbing algorithm. Asadi et al. used a distributed Server-Client architecture for Concept Location problem (Asadi et al. 2010). All three of these previous approaches use a distributed architecture that requires multiple machines. The present paper is the first work on SBSE that presents results for the highly affordable parallelism based on GPGPU.

There is existing work that re-implements meta-heuristic algorithms on GPGPU for non-SBSE applications. This previous work ports the meta-heuristic algorithms in their entirety (Wong 2009) to GPGPU, whereas the present paper concerns the practical improvement in scalability of SBSE, rather than the technical feasibility of GPGPU-based meta-heuristic implementation. The present paper thus re-implements only the most parallelisable module in Multi-Objective Evolutionary Algorithms (MOEAs) and performs an extensive empirical evaluation of the impact on scalability.

We believe that this approach may also prove to be applicable to many other SBSE problems. To achieve this, it will be necessary to develop new ways to port the fitness computation to the GPGPU device. For some applications, such as test data generation and re-generation problems (Ali et al. 2010; Yoo and Harman 2012b), this may not be possible, because the fitness function requires execution of the program under test; we cannot be sure that GPGPU devices will ever develop to the point that execution of arbitrary code will become possible.

However, for other SBSE applications, such as requirements optimisation (Saliu and Ruhe 2007; Zhang et al. 2011), prediction system feature selection (Kirsopp et al. 2002) and requirements sensitivity analysis (Harman et al. 2009), fitness computation remains an oft-repeated and, thereby, SIMD-friendly requirement. Also, in these application areas, the underlying Software Engineering problem may be characterised using a table (albeit a very large one). In such cases, where the SBSE problem can be formulated in terms of the optimisation of choices, based on a spreadsheet of known values, this may prove to port well onto the SIMD architecture offered by GPGPU devices.

Furthermore, for requirements optimisation problems there is known to be a close similarity between the problem representation for search based requirements and search based regression testing (Harman et al. 2012). As a result, the techniques used here may also prove to be readily applicable to these problems with little need for significant modification of our approach.

9 Conclusion

This paper presented the first results on GPGPU SBSE; the use of GPGPU-based massive parallelism for improving scalability of regression testing, based on Search-Based Software Engineering (SBSE). The advances in GPGPU architecture and the consequent availability of parallelism provide an ideal platform for improving SBSE scalability through SIMD parallelism.

The paper presents an evaluation of the GPGPU-based test suite minimisation for real-world examples that include an industry-scale test suite. This approach to GPGPU SBSE was evaluated on three popular multi-objective evolutionary algorithms. The results show that the GPGPU-based optimisation can achieve a speed-up of up to 25.09× compared to a single-threaded version of the same algorithm executed on a CPU. The highest speed-up achieved by the CPU-based parallel optimisation was 9.29×. Statistical analysis shows that the speed-up correlates to the logarithmic of the problem size, i.e. the size of the program under test and the size of the test suite. This finding indicates that as the problem becomes larger, the scalability of the proposed approach increases; a very attractive finding. Future work will include an empirical study of a wider range of test suites, as well as seeking insights into why MOEAs benefit differently from parallelisation.

Notes

This paper is an extended version of our SSBSE 2011 paper (Yoo et al. 2011a), which was the first to propose GPGPU SBSE.
Given a universe (in our context, all test requirements), a representative set is the set of subsets of universe (in our context, subsets of test requirements achieved by different tests) that whose union is equal to the universe.

References

ATI Stream Computing (2010) OpenCL Programming Guide Rev. 1.05. AMD Corporation
Afzal W, Torkar R, Feldt R (2009) A systematic review of search-based testing for non-functional system properties. Inf Softw Technol 51(6):957–976
Article Google Scholar
Ali S, Briand LC, Hemmati H, Panesar-Walawege RK (2010) A systematic review of the application and empirical investigation of search-based test-case generation. IEEE Trans Softw Eng 36(6):742–762
Article Google Scholar
Asadi F, Antoniol G, Guéhéneuc Y (2010) Concept locations with genetic algorithms: a comparison of four distributed architectures. In: Proceedings of 2nd international symposium on search based software engineering, SSBSE 2010. IEEE Computer Society Press, Benevento, pp 153–162
Google Scholar
Black J, Melachrinoudis E, Kaeli D (2004) Bi-criteria models for all-uses test suite reduction. In: Proceedings of the 26th international conference on software engineering, pp 106–115
Boyer M, Tarjan D, Acton ST, Skadron K (2009) Accelerating leukocyte tracking using cuda: a case study in leveraging manycore coprocessors. In: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Bull JM, Westhead MD, Kambites ME, Obrzalek J (2000) Towards OpenMP for java. In: Proceedings of the 2nd European workshop on OpenMP, pp 98–105
Chafik O (2009) JavaCL: opensource Java wrapper for OpenCL library (2009). http://code.google.com/p/javacl/. Accessed 6 June 2010
Chau PYK, Tam KY (1997) Factors affecting the adoption of open systems: an exploratory study. MIS Q 21(1):1–20
Article Google Scholar
Chen T, Lau M (1995) Heuristics towards the optimization of the size of a test suite. In: Proceedings of the 3rd international conference on software quality management, vol 2, pp 415–424
Chen TY, Lau MF (1996) Dividing strategies for the optimization of a test suite. Inf Process Lett 60(3):135–141
Article MathSciNet MATH Google Scholar
Clark J, Dolado JJ, Harman M, Hierons RM, Jones B, Lumkin M, Mitchell B, Mancoridis S, Rees K, Roper M, Shepperd M (2003) Reformulating software engineering as a search problem. IEE Proc Softw 150(3):161–175
Article Google Scholar
Cordy JR (2003) Comprehending reality—practical barriers to industrial adoption of software maintenance automation. In: IEEE International Workshop on Program Comprehension, IWPC ’03. IEEE Computer Society, pp 196–206
Do H, Elbaum SG, Rothermel G (2005) Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact. Empir Software Eng 10(4):405–435
Article Google Scholar
Durillo J, Nebro A, Alba E (2010) The jmetal framework for multi-objective optimization: design and architecture. In: Proceedings of congress on evolutionary computation 2010. Barcelona, Spain, pp 4138–4325
Durillo JJ, Nebro AJ, Luna F, Dorronsoro B, Alba E (2006) jMetal: a Java framework for developing multi-objective optimization metaheuristics. Tech. Rep. ITI-2006-10, Departamento de Lenguajes y Ciencias de la Computación, University of Málaga, E.T.S.I. Informática, Campus de Teatinos
Ekman M, Warg F, Nilsson J (2005) An in-depth look at computer performance growth. Comput Archit News 33(1):144–147
Article Google Scholar
Engström E, Runeson P, Wikstrand G (2010) An empirical evaluation of regression testing based on fix-cache recommendations. In: Proceedings of the 3rd International Conference on Software Testing verification and validation, ICST 2010. IEEE Computer Society Press, pp 75–78
Engström E, Runeson PP, Skoglund M (2009) A systematic review on regression test selection techniques. Inf Softw Technol 52(1):14–30
Article Google Scholar
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-Completeness. W. H. Freeman and Company, New York, NY
MATH Google Scholar
Govindaraju NK, Gray J, Kumar R, Manocha D (2006) Gputerasort: high performance graphics coprocessor sorting for large database management. In: ACM SIGMOD
Harman M (2007) The current state and future of search based software engineering. In: 2007 Future of Software Engineering, FOSE ’07. IEEE Computer Society, Washington, DC, pp 342–357
Google Scholar
Harman M (2011) Making the case for MORTO: multi-objective regression test optimization. In: 1st international workshop on regression testing (Regression 2011). Berlin, Germany
Harman M, Jones BF (2001) Search based software engineering. Inf Softw Technol 43(14):833–839
Article Google Scholar
Harman M, Krinke J, Ren J, Yoo S (2009) Search based data sensitivity analysis applied to requirement engineering. In: Proceedings of the 11th annual conference on Genetic and Evolutionary Computation, GECCO ’09. ACM, Montreal, pp 1681–1688
Chapter Google Scholar
Harman M, Mansouri A, Zhang Y (2012) Search based software engineering trends, techniques and applications. ACM Comput Surv 45(1):11:1–11:61
Article Google Scholar
Harrold M, Orso A (2008) Retesting software during development and maintenance. In: Frontiers of Software Maintenance, FoSM 2008. IEEE Computer Society Press, pp 99–108
Harrold MJ, Gupta R, Soffa ML (1993) A methodology for controlling the size of a test suite. ACM Trans Softw Eng Methodol 2(3):270–285
Article Google Scholar
Hutchins M, Foster H, Goradia T, Ostrand T (1994) Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In: Proceedings of the 16th International Conference on Software Engineering, ICSE 1994. IEEE Computer Society Press, pp 191–200
Kim JM, Porter A (2002) A history-based test prioritization technique for regression testing in resource constrained environments. In: Proceedings of the 24th international conference on software engineering. ACM Press, pp 119–129
Kirsopp C, Shepperd M, Hart J (2002) Search heuristics, case-based reasoning and software project effort prediction. In: Proceedings of the genetic and evolutionary computation conference, GECCO 2002. Morgan Kaufmann Publishers, San Francisco, CA, pp 1367–1374
Google Scholar
Langdon WB, Banzhaf W (2008) A SIMD interpreter for genetic programming on GPU graphics cards. In: O’Neill M, Vanneschi L, Gustafson S, Esparcia Alcazar AI, De Falco I, Della Cioppa A, Tarantino E (eds) Proceedings of the 11th European conference on Genetic Programming, EuroGP 2008. Lecture notes in computer science, vol 4971. Springer, pp 73–85
Li Z, Harman M, Hierons RM (2007) Search algorithms for regression test case prioritization. IEEE Trans Softw Eng 33(4):225–237
Article Google Scholar
Mahdavi K, Harman M, Hierons RM (2003) A multiple hill climbing approach to software module clustering. In: IEEE international conference on software maintenance. IEEE Computer Society Press, Los Alamitos, CA, pp 315–324
Google Scholar
Maia CLB, do Carmo RAF, de Freitas FG, de Campos GAL, de Souza, JT (2009) A multi-objective approach for the regression test case selection problem. In: Proceedings of anais do XLI Simpòsio Brasileiro de Pesquisa Operacional (SBPO 2009), pp 1824–1835
Mitchell BS, Traverso M, Mancoridis S (2001) An architecture for distributing the computation of software clustering algorithms. In: IEEE/IFIP proceedings of the Working Conference on Software Architecture, WICSA ’01. IEEE Computer Society, Amsterdam, pp 181–190
Chapter Google Scholar
Nethercote N, Seward J (2007) Valgrind: a program supervision framework. In: Proceedings of ACM conference on programming language design and implementation. ACM Press, pp 89–100
Offutt J, Pan J, Voas J (1995) Procedures for reducing the size of coverage-based test sets. In: Proceedings of the 12th international conference on testing computer software. ACM Press, pp 111–123
Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell TJ (2007) A survey of general-purpose computation on graphics hardware. Comput Graph Forum 26(1):80–113
Article Google Scholar
Praditwong K, Yao X (2006) A new multi-objective evolutionary optimisation algorithm: the two-archive algorithm. In: Proceedings of computational intelligence and security, international conference. Lecture notes in computer science, vol 4456, pp 95–104
Premkumar G, Potter M (1995) Adoption of Computer Aided Software Engineering (CASE) technology: an innovation adoption perspective. Database 26(2&3):105–124
Google Scholar
Räihä O (2009) A survey on search-based software design. In: Tech. Rep. D-2009-1. Department of Computer Science, University of Tampere
Rothermel G, Harrold MJ, Ostrin J, Hong C (1998) An empirical study of the effects of minimization on the fault detection capabilities of test suites. In: Proceedings of International Conference on Software Maintenance, ICSM 1998. IEEE Computer Society Press, Bethesda, MD, pp 34–43
Google Scholar
Rothermel G, Elbaum S, Malishevsky A, Kallakuri P, Davia B (2002a) The impact of test suite granularity on the cost-effectiveness of regression testing. In: Proceedings of the 24th International Conference on Software Engineering, ICSE 2002. ACM Press, pp 130–140
Rothermel G, Harrold M, Ronne J, Hong C (2002b) Empirical studies of test suite reduction. Softw Test Verif Reliab 4(2):219–249
Article Google Scholar
Saliu MO, Ruhe G (2007) Bi-objective release planning for evolving software systems. In: Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT symposium on the Foundations of Software Engineering, ESEC-FSE 2007. ACM Press, New York, NY, pp 105–114
Google Scholar
de Souza JT, Maia CL, de Freitas FG, Coutinho DP (2010) The human competitiveness of search based software engineering. In: Proceedings of 2nd international Symposium on Search Based Software Engineering, SSBSE 2010. IEEE Computer Society Press, Benevento, pp 143–152
Google Scholar
Tsutsui S, Fujimoto N (2009) Solving quadratic assignment problems by genetic algorithms with GPU computation: a case study. In: Proceedings of the 11th annual conference companion on Genetic and Evolutionary Computation Conference, GECCO 2009. ACM Press, pp 2523–2530
Walcott KR, Soffa ML, Kapfhammer GM, Roos RS (2006) Time aware test suite prioritization. In: Proceedings of the international symposium on software testing and analysis, pp 1–12
Wilson G, Banzhaf W (2009) Deployment of CPU and GPU-based genetic programming on heterogeneous devices. In: Proceedings of the 11th annual conference companion on Genetic and Evolutionary Computation Conference, GECCO 2009. ACM Press, New York, NY, pp 2531–2538
Chapter Google Scholar
Wong ML (2009) Parallel multi-objective evolutionary algorithms on graphics processing units. In: Proceedings of the 11th annual conference companion on Genetic and Evolutionary Computation Conference, GECCO 2009. ACM Press, New York, NY, pp 2515–2522
Chapter Google Scholar
Wong WE, Horgan JR, London S, Mathur AP (1998) Effect of test set minimization on fault detection effectiveness. Softw Pract Exp 28(4):347–369
Article Google Scholar
Wong WE, Horgan JR, Mathur AP, Pasquini A (1999) Test set size minimization and fault detection effectiveness: a case study in a space application. J Syst Softw 48(2):79–89
Article Google Scholar
Yoo S, Harman M (2007) Pareto efficient multi-objective test case selection. In: Proceedings of international symposium on software testing and analysis. ACM Press, pp 140–150
Yoo S, Harman M (2012a) Regression testing minimisation, selection and prioritisation: a survey. Softw Test Verif Reliab 22(2):67–120
Article Google Scholar
Yoo S, Harman M (2012b) Test data regeneration: generating new test data from existing test data. J Softw Test Verif Reliab 22(3):171–201
Article Google Scholar
Yoo S, Harman M, Ur S (2009) Measuring and improving latency to avoid test suite wear out. In: Proceedings of the Interntional Conference on Software Testing, Verification and Validation Workshop, ICSTW 2009. IEEE Computer Society Press, pp 101–110
Yoo S, Harman M, Ur S (2011a) Highly scalable multi-objective test suite minimisation using graphics card. In: LNCS: proceedings of the 3rd international Symposium on Search-Based Software Engineering, SSBSE, vol 6956, pp 219–236
Yoo S, Nilsson R, Harman M (2011b) Faster fault finding at Google using multi objective regression test optimisation. In: 8th European Software Engineering Conference and the ACM SIGSOFT symposium on the Foundations of Software Engineering, ESEC/FSE ’11. Szeged, Hungary. Industry Track
Zhang Y (2011) SBSE repository (2011). http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/. Accessed 14 Feb 2011
Zhang Y, Harman M, Finkelstein A, Mansouri A (2011) Comparing the performance of metaheuristics for the analysis of multi-stakeholder tradeoffs in requirements optimisation. J Inf Softw Technol 53(6):761–773
Article Google Scholar

Download references

Author information

Authors and Affiliations

University College London, Gower Street, London, WC1E 6BT, UK
Shin Yoo & Mark Harman
University of Bristol, Senate House, Tyndall Avenue, Bristol, BS8 1TH, UK
Shmuel Ur

Authors

Shin Yoo
View author publications
You can also search for this author in PubMed Google Scholar
Mark Harman
View author publications
You can also search for this author in PubMed Google Scholar
Shmuel Ur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shin Yoo.

Additional information

Communicated by: Mel O. Cinneide

Appendix

Tables 9, 10 and 11 present the results of regression analysis for the three algorithm respectively. Tables 12, 13 and 14 contain the mean and standard deviation of Time _total, Time _init, Time _fitness and Time _remaining for NSGA-II, SPEA2 and Two Archive algorithm respectively.

Table 9 Regression analysis for NSGA-II

Full size table

Table 10 Regression analysis for SPEA2

Full size table

Table 11 Regression analysis for two archive

Full size table

Table 12 Execution time of NSGA-II algotirhm

Full size table

Table 13 Execution time of SPEA2 algotirhm

Full size table

Table 14 Execution time of two archive algotirhm

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yoo, S., Harman, M. & Ur, S. GPGPU test suite minimisation: search based software engineering performance improvement using graphics cards. Empir Software Eng 18, 550–593 (2013). https://doi.org/10.1007/s10664-013-9247-y

Download citation

Published: 17 March 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s10664-013-9247-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

GPGPU test suite minimisation: search based software engineering performance improvement using graphics cards

Abstract

Similar content being viewed by others

Parallel Multi-Objective Genetic Algorithm

Why GPGPUs for Evolutionary Computation?

Implementation Techniques for Massively Parallel Multi-objective Optimization

1 Introduction

2 Background

Multi-objective Test Suite Minimisation

2.1 Test Suite Minimisation Problem

Given

Problem

GPGPU and Evolutionary Algorithms

3 Parallel Formulation of MOEA for Test Suite Minimisation

Parallel Fitness Evaluation

Cost and Coverage

4 Algorithms

Parallel Matrix Multiplication Algorithm

Coverage Collection Algorithm

5 Experimental Setup

5.1 Research Questions

RQ1. Speed-up

RQ2. Correlation

RQ3. Insight

5.2 Subjects

5.3 Implementation and Hardware

Implementation

Hardware

5.4 Evaluation

6 Results

6.1 Speed-up

6.2 Correlation

6.3 Insights

7 Threats to Validity

8 Related Work

9 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation