1 Introduction

Software Product Lines (SPLs) are used to achieve a more efficient software development and management of the variability of software products, reducing the costs and time to market, as well as maintenance costs (Pohl et al. 2005). These product lines carry a great variability within products of the same family of products. This variability is due to the mass customization and it implies a great challenge when we face the task of testing because of the combinatorial explosion in the number of products (Engström and Runeson 2011).

Many proposals have arisen having into account these difficulties (Cohen et al. 2008). Some of these are based on pairwise testing (Lopez-Herrejon et al. 2013; Oster et al. 2010; Perrouin et al. 2012), where each possible combination of two features must be present in at least one product. Some combinations can be more important than others, introducing a priority among configurations or features. In this case, a weight is assigned to each configuration, which can be derived from weights assigned to products. The optimization problem that we want to solve consists in finding a set of products with the minimum cardinal covering all weighted configurations. Additionally, we want to sort the products in such a way that we first test the products containing higher priority features.

Recent state-of-the-art proposals on pairwise testing include hybrid algorithms, mixing heuristics with exact algorithms (Ferrer et al. 2017). Henard et al. (2014) proposed a similarity measure to build test suites by adding the most dissimilar products. In this way, they designed a very fast search-based algorithm for t-wise test data generation. Also, there are other proposals dealing with multiple objectives (Henard et al. 2015; Xue and Li 2018), however, none of them take into account feature priorities (nor weights) like we do in this work. Other many-objective approaches such as Hierons et al. (2016) define the violation of model constraints as an objective. This approach might lead us to a solution which violates some constraints.

The hypothesis for this work is that a hybrid matheuristic approach can improve the performance on large instances of the problem, particularly, generating in a probabilistic way new sub-instances of the problem that are combined and solved with an exact algorithm in order to find the best solutions to the whole problem.

Our main contribution is the adaptation of a new approach based on a matheuristic named Construct, Merge, Solve and Adapt to solve the Prioritized Pairwise Test Data Generation Problem. In order to validate the benefits of our proposal we compare the results with four algorithms: two hybrid algorithms based on integer programming (Ferrer et al. 2017), one with a linear formulation (HILP) and the other one with a nonlinear formulation (HINLP); the Parallel Prioritized Genetic Solver (PPGS) (Lopez-Herrejon et al. 2014); and a greedy algorithm called prioritized-ICPL (Johansen et al. 2012).

The rest of the article is divided into seven sections. In the next section we introduce the background required to understand both the problem and the algorithm that we propose. Section 3 is devoted to the formalization of the Prioritized Pairwise Test Data Generation Problem. In Sect. 4, we explain the design and implementation of the CMSA adaptation to the Prioritized Pairwise Test Data Generation Problem. In Sect. 5, we briefly explain the algorithms in the comparison and the experimental setup. Results are analyzed in Sect. 6 and we present our conclusions in Sect. 7.

2 Background

2.1 Feature models

Feature models are used in SPLs to define the functionality of a product within a software family as a single combination of features. Models can also represent the constraints that exist between features. A hierarchical tree structure is used to characterize a feature model, where the nodes of the tree are features and the edges represent relationships between these features.

Figure 1 represents the feature model of a classical problem in the evaluation of product-line methodologies, the Graph Product Line (GPL) (Lopez-Herrejon and Batory 2001).

Fig. 1
figure 1

Graph Product Line feature model

There exist four types of relationships between features, differentiated graphically in the feature model:

  • Mandatory These features are selected when their parent is selected. For example, in the Graph Product Line model, Driver, Benchmark, GraphType and Algorithms are present in all the products of the family.

  • Optional They can be or not be selected, like for example, the Weight or Search features.

  • XOR relations In these cases only one of the features of the group must be selected when the parent feature is selected. DFS and BFS features are of this kind in the Graph Product Line model.

  • Inclusive-or This last kind of relation indicates that at least one of the features of the group must be selected when its parent feature is selected. In the example, when the feature Algorithms is selected, at least one of the group {Num, CC, SCC, Cycle, Shortest, Prim and Kruskal} must be selected.

There are two other types of constraints over the model, called Cross-Tree Constraints (CTC): requires and excludes. In the Graph Product Line feature model, we can observe these kind of constraints below the tree structure. For example “Num requires Search” implies that when the Num feature is selected, the Search feature must be selected as well. On the other hand, “Kruskal excludes Prim” implies that when the Kruskal feature is selected, the Prim feature must not be selected.

It can easily be seen that these constraints can be formalized using propositional logic.

2.2 Construct, merge, solve and adapt

Matheuristics are techniques that combine metaheuristics and mathematical programming techniques. The Construct, Merge, Solve & Adapt (CMSA) algorithm is a matheuristic for combinatorial optimization introduced by Blum et al. (2016). Before describing the algorithm we present some concepts.

Given a problem P, let C be the set of all possible components of the solution to the instance of our problem I. C is called the complete set of solution components with respect to I. A valid solution S to I is represented as a subset of the solution components, that is, \(S \subseteq C\). Finally, a sub-instance \(C'\) of I is a subset of the set of solution components, so \(C' \subseteq C\). The idea of the algorithm is the following (flow graph in Fig. 2).

Fig. 2
figure 2

CMSA flow graph (image by C. Blum)

While the time limit established is not reached:

  1. 1.

    First, it generates \(n_a\) solutions of the main instance of the problem I.

  2. 2.

    All the components belonging to the generated solutions are merged to form a sub-instance of the problem, \(C'\).

  3. 3.

    An exact algorithm is applied to the sub-instance \(C'\), expecting that the solution for \(C'\) is (quasi-)optimal for C.

  4. 4.

    The solution is compared with the best solution found at the moment and \(C'\) is updated accordingly with a defined aging policy, mostly deleting useless solution components.

The algorithm (see Algorithm 1) has two key components that have to be defined accordingly for the target problem:

  • A solution generator: based on some randomized strategy and providing high quality solutions.

  • An exact solver for the sub-instances: for example, based on ILP or other exact algorithm.

figure a

With the aim of clarifying the behaviour of the algorithm, we are going to describe it informally. Once the algorithm has an initial population (several test suites), it merges all the different products which comprises the test suites. Then, an exact solver selects a test suite, from the set of products, with minimum cardinality aiming at total weighted coverage, which is at least as good as the best known. Then, the algorithm execute the Adapt method (Algorithm 1, Line 17). The algorithm increments the age of the products not used in the new solution. When the age of a product is over a threshold, the product is discarded. In Sect. 4, we explain how we adapt the algorithm to prioritized SPL.

3 Problem formalization: prioritized pairwise test data generation

Now we present the terminology related to Combinatorial Interaction Testing (CIT). This approach builds a set of samples that allow to test different system configurations (Nie and Leung 2011). When we apply this approach to SPL testing, the set of samples is a subset of the products of the family. Next, we introduce the concepts that will lead us to the Prioritized Pairwise Test Data Generation Problem.

Definition 1

(Feature list) A feature list FL is the list of all the features in a feature model. The feature list of the running example shown in Fig. 1 is the following:

FL = {GPL, Driver, Benchmark, GraphType, Directed, Undirected, Weight, Search, DFS, BFS, Algorithms, Num, CC, Prim, SCC, Kruskal, Cycle, Shortest}.

Definition 2

(Product) A product is represented by a pair \((S, {\bar{S}})\), where S is a subset of a feature list FL, \(S \subseteq FL\). Thus, \({\bar{S}} = FL - S\). For example, a product p could be Benchmark selected and the rest of features unselected.

p = ({Benchmark},{GPL, Driver, GraphType, Directed, Undirected, Weight, Search, DFS, BFS, Algorithms, Num, CC, Prim, SCC, Kruskal, Cycle, Shortest})

Definition 3

(Valid product) We say that a product p is valid with respect a feature model fm iff p.S and \(p.{\bar{S}}\) do not violate any constraint described by the feature model. The set of all valid products of a feature model is denoted by \(P^{fm}\). For example, a valid product vp is GPL, Driver, Benchmark, GraphType, Directed, Algorithms, Num, Search, and DFS selected and the rest of features unselected.

vp = ({GPL, Driver, Benchmark, GraphType, Directed, Algorithms, Num, Search, DFS},{Undirected, Weight, BFS, CC, Prim, SCC, Kruskal, Cycle, Shortest})

Definition 4

(Pair) A pair pr is a tuple \((s, {\bar{s}})\), where s and \({\bar{s}}\) represent two disjoint feature sets from a feature list FL, which union has two different features. That is, \(pr.s \cup pr.{\bar{s}} \subseteq FL\), \(pr.s \cap pr.{\bar{s}} = \emptyset \) and \(|pr.s \cup pr.{\bar{s}}|=2\). A pair pr is covered by a product p iff \(pr.s \subseteq p.S\ \wedge \ pr.{\bar{s}} \subseteq p.{\bar{S}}\). For example, a pair pa is Directed and Undirected both selected.

\(pa=(\{Directed,Undirected\},\emptyset )\)

Definition 5

(Valid Pair) A pair pr is valid within a feature model fm if there exists a product that covers pr. The set of all valid pairs in a feature model fm is denoted with \(VPR^{fm}\). For example, a valid pair vpa is Directed selected and Undirected unselected.

\(vpa=(\{Directed\},\{Undirected\})\)

Based on the previous definitions of feature list, product, valid product, pair and valid pair, we define higher levels concepts related to the problem formulation. In the following we define the concept of test suite, prioritized product, configuration, covering array and coverage.

Definition 6

(Test suite) A test suite ts for a feature model fm is a set of valid products of fm. A test suite ts is complete if it covers all the valid pairs in \(VPR^{fm}\), that is, \(\forall pr \in VPR^{fm} \rightarrow \exists p \in ts\) such that p covers pr

Definition 7

(Prioritized product) A prioritized product pp is a tuple (pw), where p represents a valid product in a feature model fm and \(w \in {\mathbb {R}}\) represents its weight.

Definition 8

(Configuration) A configuration c is a tuple (prw) where pr is a valid pair and \(w \in {\mathbb {R}}\) represent its weight. w is computed as follows. Let PP be the set of all prioritized products and \(PP_{pr}\) a subset of PP, such that \(PP_{pr}\) contains all the prioritized products of PP that cover pr, that is, \(PP_{pr} = \{\ p \in PP\ |\ p \) covers \( pr\ \}\). Then \(w = \sum _{p\in PP_{pr}} p.w\)

Definition 9

(Covering Array) A covering array CA for a feature model fm and a set of configurations C is a set of valid products P that covers all configurations in C whose weight is greater that zero: \(\forall c \in C\ (c.w > 0 \rightarrow \exists p \in CA \) such that p covers c.pr).

Definition 10

(Coverage) Given a covering array CA and a set of configurations C, we define cov(CA) as the sum of all configuration weights in C covered by any configuration in CA divided by the sum of all configuration weights in C, that is:

$$\begin{aligned} cov( CA ) = \frac{\sum \nolimits _{\begin{array}{c} c\in C,\ \exists p \in CA ,\ p\ covers\ c.pr \end{array}} c.w}{\sum \nolimits _{c\in C} c.w} . \end{aligned}$$
(1)

The optimization problem of our interest consists in finding a covering array CA with the minimum number of products |CA| for a given coverage, cov(CA). This problem is defined as single-objective, but it is possible to formulate the problem as bi-objective, where we want to maximize coverage and minimize the number of products. We can even add more objectives to the formulation, like the cost of building a product for testing (we are considering in this work that all the products have the same construction cost). Adding new objectives to the problem does not prevent CMSA of being used. Although CMSA solves single-objective problems, there are high level algorithms that can be applied to solve a multi-objective problem using single-objective solvers. One interesting example in the case of multi-objective Integer Linear Programming is the one proposed by Dächert and Klamroth in Dächert and Klamroth (2015). Thus, neither the use of ILP solvers inside CMSA nor the use of CMSA itself pose a serious limitation to the number of objectives we can add to our current formulation.

4 Applying CMSA to prioritized SPL

In order to apply the CMSA algorithm to our problem we have to define the three methods described in Algorithm 1: ProbabilisticSolution, ExactSolver and Adapt methods. Before that, we have to guess what is a solution component for the prioritized pairwise test data generation in the CMSA algorithm. In this case, given that we want to minimize the cardinality of the set of products that entirely covers all the possible configurations in the feature model, a solution component is a valid product from the feature model FM, being P the set of all valid products of FM.

Coming up next we introduce the idea of the three components of the CMSA adaptation to our problem. First, we explain the ProbabilisticSolution method, then we present the ExactSolver method. Finally, we describe the aging policy.

4.1 Solution generation

At the start of the main loop of the algorithm, several solutions for P are generated to be merged in a final sub-instance that is solved later. We consider the whole search space P, generating random products that are valid within the feature model. The generation of random valid products is performed by means of an ILP solver which is very fast, being faster and faster as the set of uncovered pairs becomes smaller, because it considers less weighted pairs in the objective function. We have set a stopping condition of 3 seconds for the ILP solver, just to be sure it returns a solution. This process is also used for the initial population. The pseudo-code of this method is described below in Algorithm 2.

figure b

The fact that there is no uncovered configuration in each solution generated ensures that the solution will always cover all the weighted configurations, and this property also holds in the sub-instance generated after merging all the solutions. This strategy guarantees that, eventually, the algorithm will find a solution better than the previous one.

4.2 Exact solver

We use an exact algorithm to compute the best solution for the sub-instance generated. We use an ILP solver to select a subset of products for the sub-instance which, reaching a given weighted coverage, has minimum cardinality. This problem is equivalent to the hitting set problem or the test suite minimization problem, in its mono-objective version (Arito et al. 2012) and, thus, we can use the same integer linear program to solve the problem.

4.3 Adapting the sub-instance

The aging mechanism used here is the same proposed by Blum et al. (2016). In each iteration of the algorithm, the sub-instance is composed by a subset \(C' \subseteq C\) of the components of the problem. The ExactSolver method returns a solution, which is a subset \(S \subseteq C'\) of components. Then, the “age” of all the components that are part of the solution S are reset to 0, while the rest of the components see their age increased by 1. If any component reaches the maximum age established (\(age_{max}\)), then the component is removed from \(C'\). Algorithm 3 shows the pseudo-code for the Adapt method.

figure c

5 Experimental setup

In this section we describe how the analysis of the approach is performed. First, we introduce the other four algorithms we compare with CMSA. Then, we describe the benchmark used for the evaluation. Finally, we explain the different experiment configurations.

5.1 Hybrid algorithms based on integer programming

Two different hybrid algorithms combining a greedy heuristic and integer programming were introduced by Ferrer et al. (2017). The first one, called HILP, is based on an integer linear formulation, and the second, named HINLP, is based on a quadratic (nonlinear) integer formulation. The two algorithms proposed in this work use the same high level greedy strategy. In each iteration they try to find a product that maximizes the weighted coverage. They select in each iteration the product that contributes with greater coverage to the actual solution. The algorithm applies the heuristic to the whole product set (that can be of billions of possible products) instead of small subsets. For further details on HILP or HINLP, please refer to Ferrer et al. (2017).

5.2 Prioritized pairwise genetic solver

Prioritized Pairwise Genetic Solver (PPGS) is a constructive genetic algorithm that follows a master-slave model to parallelize the individuals’ evaluation. In each iteration, the algorithm adds the best product to the test suite until all weighted pairs are covered. The best product to be added is the product that adds more weighted coverage (only pairs not covered yet) to the set of products.

The parameter setting used by PPGS is the same of the reference paper for the algorithm (Lopez-Herrejon et al. 2014). It uses binary tournament selection and a one-point crossover with a probability 0.8. The population size of 10 individuals favours the exploitation rather than the exploration during search. The termination condition is to reach 1000 fitness evaluations. The mutation operator iterates over all selected features of an individual and randomly replaces a feature by another one with a probability 0.1. The algorithm stops when all the weighted pairs have been covered. For further details on PPGS see Lopez-Herrejon et al. (2014).

5.3 Prioritized-ICPL algorithm

Prioritized-ICPL (pICPL) is a greedy algorithm to generate t-wise covering arrays proposed by Johansen et al. (2012). pICPL does not compute covering arrays with full coverage but rather covers only those t-wise combinations among features that are present in at least one of the prioritized products, as was described in the formalization of the problem in Sect. 3. We must highlight here that the pICPL algorithm uses data parallel execution, supporting any number of processors. Their parallelism comes from simultaneous operations across large sets of data. For further details on prioritized-ICPL please refer to Johansen et al. (2012).

5.4 Benchmark

The feature models that we use for the comparison of the algorithms are generated from 16 real SPL systems. We considered a method called measured values to assign weight values to prioritized products. This method consists in assigning the weights derived from non-functional property values obtained from 16 real SPL systems, that were measured with the SPL Conqueror approach introduced by Siegmund et al. (2013). This approach aims at providing reliable estimates of measurable non-functional properties such as performance, main memory consumption, and footprint. These estimations are then used to emulate more realistic scenarios where software testers need to schedule their testing effort giving priority, for instance, to products or feature combinations that exhibit higher footprint or performance. In this work, we use the actual values taken on the measured products considering pairwise feature interactions. Table 1 summarizes the SPL systems evaluated, their feature number (FN), products number (PN), configurations number measured (CN), and the percentage of prioritized products (PP%) used in our comparison.

Table 1 Benchmark of feature models

In the case of the feature models where the percentage of the prioritized products is equal to 100%, applying the heuristic to the whole solution components set P (without generating sub-instances) should give similar results than applying HINLP. A greedy implementation of CMSA without generating sub-instances have been implemented in order to test the rest of the functionalities of the algorithm and validate the consistency of the results.

5.5 Experiments configuration

The experiments were run on a cluster of 16 machines with Intel Core2 Quad processors Q9400 (4 cores per processor) at 2.66 GHz and 4 GB memory and 2 nodes (96 cores) equipped with two Intel Xeon CPU (E5-2670 v3) at 2.30 GHz and 64 GB memory. The cluster was managed by HTCondor 8.2.7, which allowed us to perform parallel independent executions to reduce the overall experimentation time.

For the CMSA algorithm, different parameter configurations of time limit (\(max_{time}\)), solutions generated per iteration (\(n_a\)) and maximum age for the aging policy (\(age_{max}\)) were used. As a result, we chose the best configuration with \(n_a\)=5 and \(age_{max}=4\). For each configuration, a total of 30 independent runs were executed.

We have to keep in mind that the \(max_{time}=\)3 seconds indicates the limit of time for the last iteration, in cases of some feature models (i.e. Violet) where the number of constraints is high, the algorithm can take more time. We applied the Kolmogorov–Smirnov normality test to confirm that the distribution of the results is not normal. Therefore, we applied the non-parametric Kruskal–Wallis test with a confidence level of 95% (p-value under 0.05) with Bonferroni’s p value correction to check if the observed differences are statistically significant. In the cases where Kruskal-Wallis test rejects the null hypothesis, we run a single factor ANOVA post hoc test for pairwise comparisons.

In order to properly interpret the results of statistical tests, it is always advisable to report effect size measures. For that purpose, we have also used the non-parametric effect size measure \({\widehat{A}}_{12}\) statistic proposed by Vargha and Delaney (2000). It tells us how often, on average, one technique outperforms the other. It could be used to determine the probability of yielding higher performance by different algorithms. Given a performance measure M, \({\hat{A}}_{12}\) measures the probability that running algorithm A yields higher M values than running another algorithm B. If the two algorithms are equivalent, then \({\hat{A}}_{12}=0.5\). If \({\hat{A}}_{12}=0.3\) then one would obtain higher values for M with algorithm A, 30% of the time.

6 Analysis of the results

In this section we analyze the results of the execution of CMSA in comparison with the results of state-of-the-art algorithms (HILP, HINLP, PPGS and pICPL). Table 2 summarizes the results of the execution of the algorithms for different values of weighted coverage. Each column corresponds to one algorithm and in the rows we show the number of products required to reach 50% up to 100% of weighted coverage. The data shown in each cell is the mean and the standard deviation of the number of products required to reach the coverage of the SPL for all the independent runs of the whole benchmark of feature models. We also show the average and standard deviation of the required runtime in the last line. We highlight the best value for each percentage of weighted coverage and the shortest execution time.

We observe that CMSA is the best in solution quality (less products required to obtain a particular level of coverage) for all percentages of weighted coverage. After CMSA, the algorithms based on integer programming obtain the best solutions. The difference in quality between HILP and HINLP are almost insignificant, except for 100% coverage, so it is difficult to claim that one algorithm is better than the other. Then, PPGS is the fourth algorithm in our comparison, and finally pICPL is the worst.

Table 2 Mean and standard deviation of number of products and time in 30 independent runs of the whole benchmark of feature models

Regarding the execution time, we can appreciate that HINLP is clearly the fastest algorithm, actually thanks to the nonlinear formulation it produces a boost in computation time due to the reduction of clauses in comparison with the linear variant of the algorithm. It is closely followed by HILP and pICPL (also based on a greedy strategy). They are followed by CMSA in the speed ranking and, finally, quite far from the rest, PPGS, a genetic algorithm.

In order to check whether the differences between the algorithms are statistically significant or just a matter of chance, we have applied the statistical tests explained in the previous section. For 50% coverage there is no significant differences between CMSA and the others. Next, for 75% up to 85% of weighted coverage, there are significant differences between CMSA and pICPL. Finally, for 90% up to 100% CMSA is statistically significantly better than the other algorithms. Regarding the execution time, HINLP is statistically better than the other algorithms.

In Table 3 we show the number of times an algorithm obtains the minimum mean value for each percentage of coverage and for the 16 realistic feature models. Note that Table 3 summarizes the results showed in Table 6 in the “Appendix”. On the one hand, we observe that in only 3 out of 176 comparisons (16 feature models and 11 percentages of weighted coverage) other algorithm different than CMSA has obtained a better value for solution quality. On the other hand, in 173 out of 176 comparisons CMSA obtains the best results of the comparison, moreover in 70 out of 176 CMSA is the only obtaining the best value, i.e., no other algorithm obtains such a low value. In general, it can also be seen in this table that, the larger the value of weighted coverage, the better the results of CMSA. The reason behind this behavior is that in early stages of the search, for small and medium values of coverage, it is easier to find the products which add not-covered-yet pairs. When the search progresses, it is harder to find the products which are able to add not-covered-yet pairs.

Table 3 Times an algorithm has the best average number of products per percentage of coverage

Let us now focus on how the algorithms obtain total weighted coverage in each feature model. In Table 4 we show the mean value for 100% weighted coverage. We also show the standard deviation, which is 0 in many cases. In most feature models (15 out of 16) CMSA obtains the best value. The exception is in the Linux feature model, where pICPL is the best. In three models (Curl, Prevayler and Violet) CMSA obtains the best value, although at least other algorithm is able to reach the same value. In the rest of the feature models analyzed here (12 out of 16) no other algorithm reach the same solution quality than CMSA. As we expected, there are significant differences between CMSA and the other proposals in the pairwise comparisons. Specifically, in 52 out of 64 comparisons (16 feature models and 4 algorithms) CMSA is significantly better than the other algorithms, in 11 out of 64 there is no difference between CMSA and other algorithm, and only once it is significantly worse, particularly in the Linux model with pICPL.

Table 4 Mean values over 30 independent runs for CMSA, HILP, HINLP, PPGS and pICPL

In the comparison between CMSA and HINLP (the second best algorithm), we can appreciate that HINLP is doubtless faster. However, in solution quality it is only able to find the best known solution in two feature models. In addition, there are significant differences between CMSA and HINLP in 14 out of 16 models, so we can claim that CMSA is definitely the best algorithm in the comparison in regards to solution quality.

In order to illustrate the comparison regarding the solution quality for total weighted coverage, in Fig. 3 we show a boxplot for each algorithm considering all the feature models. Note that several outliers for all algorithms are outside the worst range shown in the plot. The boxplot confirms again that CMSA is the best for 100% weighted coverage with a median value of 9 products, followed by HILP and HINLP with a median of 10.5 products, and PPGS with a median of 11 products. In addition, we can also appreciate that CMSA has a lower interquartile range, and all the quartile marks are lower in comparison to the other algorithms.

Fig. 3
figure 3

Number of products needed to achieve total coverage. For each feature model (16), there are 30 solutions obtained by independent replications

In light of the obtained results and with the intention of determining whether the results are of practical significance or not, we analyze the \({\hat{A}}_{12}\) statistic. In Table 5 we summarize the \({\hat{A}}_{12}\) statistic values for all models and algorithms. The differences between CMSA and the rest of algorithms are quite large. Specifically, CMSA beats PPGS in 82.68%, HINLP in 79.90%, HILP in 81.97% and pICPL in 90.31% of the runs. Therefore, CMSA is able of generating test suites with maximum levels of coverage, and obtain better results than the other algorithms with a high probability.

Table 5 Vargha and Delaney’s statistical test results (\({\widehat{A}}_{12}\)) for total coverage

7 Conclusions

In this work we have applied a novel matheuristic approach (CMSA) to the Prioritized Pairwise Test Data Generation Problem, aiming to ease the task of testing on large SPLs. Our main contribution is the adaptation of the CMSA algorithm to this problem for SPL, relating the CMSA algorithm to the specific nuances of the problem.

We present the empirical results derived from the evaluation of our CMSA approach on the introduced benchmark of feature models. We compare CMSA with four different approaches to tackle the problem of prioritized pairwise test data generation for SPL. Regarding the solution quality, our analysis showed an improvement in terms of the quality metric, which is better (lower in terms of the number of products) in almost all instances of the benchmark and for all percentages of weighted coverage. In addition, in most comparisons the test suites computed by CMSA are statistically significantly better than those computed by the other algorithms.

Testing on a SPL means a high cost in resources and time due to the effort devoted to the testing phase of even one single product, which can require several hours. Therefore, it is straightforward to think that the best approach is the one that reduces the size of the test suite, in this case our proposal: CMSA. In addition, the execution of the algorithm only requires a few minutes, much less than testing one single product in most of the scenarios. A general conclusion is that CMSA is clearly the best approach for computing prioritized pairwise test data. On the other hand, the approach based on nonlinear integer programming (HINLP) is able to obtain good quality solutions in only a few seconds, then it is the best option when a good solution is immediately needed. This could be the case when testing a single product of the SPL only requires a few seconds.

There remains one Achilles’ heel clearly identified in our proposal, the execution time is higher than several approaches studied here. Future work will require a deeper analysis of the performance and quality of the generation of random products, that is the baseline of the construct phase of CMSA. We plan to assess whether CPLEX, the optimizer used to solve the integer programming problems found in this work, is able to provide enough different products to obtain the diversity that every search algorithm requires.