Keywords

1 Introduction

A common observation in many areas of AI (e.g., SAT or CSP solving) and machine learning is that no single algorithm dominates the performance of all others. To exploit this complementarity of algorithms, algorithm selection systems [6, 8, 11] are used to select a well-performing algorithm for a new given instance. Algorithm selectors, such as SATzilla [12] and 3S [7], demonstrated in several SAT competitions that they can outperform pure SAT solvers by a large margin (see, e.g., the results of the SAT Challenge 2012Footnote 1).

An open problem in algorithm selection is that the machine learning model sometimes fails to select a well-performing algorithm, e.g., because of uninformative instance features. An extension of algorithm selection is to select a schedule of multiple algorithms at least one of which performs well.

To date, a fair comparison of such algorithm schedule selectors is missing, since every publication used another benchmark set and some implementations (e.g., 3S) are not publicly available (because of license reasons). To study the strengths and weaknesses of such schedulers in a fair manner, we implemented well known algorithm schedule approaches (i.e., Sunny [1] and dynamic schedules inspired by 3S [7]) in the flexible framework of flexfolio (the successor of claspfolio 2 [5]) and studied them on the algorithm selection library (ASlib [3]).

2 Per-instance Algorithm Scheduling

Similar to the per-instance algorithm selection problem [11], the per-instance algorithm scheduling problem is defined as follows:

Definition 1

(Per-instance Algorithm Scheduling Problem). Given a set of algorithms \(\mathcal {P}\), a set of instances \(\mathcal {I}\), a runtime cutoff \(\kappa \), and a performance metric \(m: \varSigma \times \mathcal {I}\rightarrow \mathbb {R}\), the per-instance algorithm scheduling problem is to find a mapping \(s: \mathcal {I}\rightarrow \varSigma \) from an instance \(\pi \in \mathcal {I}\) to a (potentially unordered) algorithm schedule \(\sigma _\pi \in \varSigma \) where each algorithm \(\mathcal {A}\in \mathcal {P}\) gets a runtime budget \(\sigma _\pi {}(\mathcal {A})\) between 0 and \(\kappa \) such that \(\sum _{\mathcal {A}\in \mathcal {P}} \sigma _\pi {}(\mathcal {A}) \le \kappa \) and \(\sum _{ \pi \in \mathcal {I}} m(s(\pi ),\pi )\) will be minimized.

The algorithm scheduler aspeed [4] addresses this problem by using a static algorithm schedule; i.e., aspeed applies the same schedule to all instances. The schedule is optimized with an answer set programming [2] solver to obtain a timeout-minimal schedule on the training instances. The scheduler aspeed either uses a second optimization step to determine a well-performing ordering of the algorithms or sorts the algorithms by their assigned times, in ascending order (such that a wrongly selected solver does not waste too much time).

Systems such as 3S [7], SATzilla [12] and claspfolio 2 [5] combine static algorithm schedules (also called pre-solving schedules) and classical algorithm selection. All these systems run the schedule for a small fraction of the runtime budget \(\kappa \) (e.g., 3S uses \(10\%\) of \(\kappa \)), and if this pre-solving schedule fails to solve the given instance, they apply per-instance algorithm selection to run an algorithm predicted to perform well. 3S and claspfolio 2 use mixed integer programming and answer set programming solvers, respectively, to obtain a timeout-minimal pre-solving schedule. SATzilla uses a grid search to obtain a pre-solving schedule that optimizes the performance of the entire system.

The algorithm scheduler Sunny [1] determines the schedule for a new instance \(\pi \) by first determining the set of k training instances \(\mathcal {I}_{k}\) closest to \(\pi \) in instance feature space, and then assigns each algorithm a runtime proportional to the number of instances in \(\mathcal {I}_{k}\) it solved. The algorithms are sorted by their average PAR10 scores on \(\mathcal {I}_{k}\), in ascending order (which corresponds to running the algorithm with the best expected performance first).

3 Instance-Specific Aspeed (ISA)

Kadioglu et al. [7] proposed a variant of 3S that uses per-instance algorithm schedules instead of a fixed split between static pre-solving schedule and algorithm selection. In order to evaluate the potential of per-instance timeout-optimized scheduling, we developed the scheduler ISA, short for instance-specific aspeed. Inspired by Kadioglu et al. [7], our implementation uses k-nearest neighbor (k-NN) to identify the set \(\mathcal {I}_{k}\) of training instances closest to a given instance \(\pi \) and then applies aspeed to obtain a timeout-minimal schedule for them.

During offline training, we have to determine a promising value for the neighborhood size k. In our experiments, we evaluated different k values between 1 and 40 by running cross-validation on the training data and stored the best performing value to use online. We chose this small upper bound for k to ensure a feasible runtime of the schedulerFootnote 2 (in our experiments less than 1 second). Furthermore, to optimize the runtime of the scheduler, we reduced the set of training instances, omitting all instances that were either solved by every algorithm or solved by none within the cutoff time.

For each new instance, ISA first computes the k nearest neighbor instances from the reduced training set. This instance set is passed to aspeed [4], which returns a timeout-minimal unordered schedule for the neighbor set. The schedule is finally aligned by sorting the time slots in ascending order.

4 Trained Sunny (TSunny)

To offer a form of scheduling with less overhead in the online stage than ISA, we implemented a modified version of Sunny [1] by adding a training phase. For a new problem instance Sunny first selects a subset of k training instances \(\mathcal {I}_{k}\) using k-NN. Then time slots are assigned to each candidate algorithm: Each solver gets one slot for each instance of \(\mathcal {I}_{k}\) it can solve within the given time. Additionally, a designated backup solver gets one slot for each instance of \(\mathcal {I}_{k}\) that cannot be solved by any of the algorithms. Having this slot assignment, the actual size of a single time slot is computed by dividing the available time by the total number of slots. Finally, the schedule is aligned by sorting the algorithms by their average PAR10 score on \(\mathcal {I}_{k}\), thereby running the most promising solver first.

Preliminary experiments for our implementation of this algorithm produced relatively poor results. Examining the schedules, we found that Sunny tends to employ many algorithms per schedule, which we suspected to be a weakness. Thus, we enhanced the algorithm by limiting the number of algorithms used in a single schedule to a specified number \(\lambda \).

Originally, Sunny is defined as lazy, i.e. not applying any training procedures after the benchmark data is gathered. However, to obtain better values for our new parameter \(\lambda \), and also to improve the choice of the neighborhood size k, we implemented a training process for Sunny. Similar to ISA, different configurations for \(\lambda \) (range 1 to the total number of solvers) and k (range 1 to 100) are evaluated by cross-validation on the training data. To distinguish this enhanced algorithm from the original Sunny, we dubbed this trained version TSunny.

5 Empirical Study

To compare the different algorithm scheduling approaches of ISA and Sunny, we implemented them in the flexible algorithm selection framework flexfolio Footnote 3 and compared them to various other systems: The static algorithm scheduling system aspeed [4], the default configuration of flexfolio (which is similar to SATzilla [12] and claspfolio 2 [5] and includes a static-presolving schedule), as well as the per-instance algorithm selector AutoFolio [9] (an automatically-configured version of flexfolio without consideration of per-instance algorithm schedules). If not mentioned otherwise, we used the default parameter values of flexfolio. The comparison is based on the algorithm selection library (ASlib [3]), which is specifically designed to fairly measure the performance of algorithm selection systems. Version 1.0 of ASlib consists of 13 scenarios from a wide range of different domains (SAT, MAXSAT, CSP, QBF, ASP and operations research).

Table 1 shows the performance of the systems as the fraction of the gap closed between the static single best algorithm and the oracle (i.e., the performance of an optimal algorithm selector), using performance metric PAR10Footnote 4. As expected, the per-instance schedules (i.e., Sunny and ISA) performed better on average than aspeed’s static schedules. However, aspeed still establishes the best performance on SAT11-HAND. By comparing Sunny and TSunny, we see that parameter tuning substantially improved performance. Comparing TSunny and ISA, we note that their overall performance is similar but that either has advantages on different scenarios; thus, there is still room for improvement by selecting the better of the two on a per-scenario basis. Surprisingly, the per-instance schedules had a similar performance (ISA with 0.71) to the state-of-the-art procedure AutoFolio (0.70); however, AutoFolio performed slightly more robustly, being amongst the best systems on 10/13 scenarios. Nevertheless, ISA establishes new state-of-the-art performance on PREMAR-2013 (short for PREMARSHALLING-ASTAR-2013) and TSunny on PROTEUS-2014 and QBF-2011 according to the on-going evaluation on ASlib Footnote 5.

Table 1. Gap metric on PAR10: 1.0 corresponds to a perfect oracle score and 0.0 corresponds to the single best score. The best score for each scenario is highlighted with bold face and all system performances have a star that are not significantly worse than the best system (permutation test with \(100\ 000\) random permutations and \(\alpha = 0.05\); “Equal to Best”). All systems are implemented in flexfolio, except Sunny which is the original version.
Table 2. Statistics of schedules: neighborhood size k, average size \(\varnothing |\sigma |\) of schedules, average position \(\varnothing suc\) of successful solver in schedule for our systems aspeed, ISA, Sunny’ (a reimplementation of the lazy version of Sunny), and TSunny (the non-lazy trained version of Sunny’)

Table 2 gives more insights into our systems’ behavior. It also includes our implemented version of Sunny without training, dubbed Sunny’. Sunny (and also Sunny’) sets the neighborhood size k as the square root of the number of instances, whereas TSunny optimizes k on the training instances. The reason for TSunny’s better performance in comparison to Sunny is probably its much smaller values for k on all scenarios except on SAT12-RAND. Also TSunny’s average schedule size was smaller on nearly all scenarios (except CSP-2010).

Comparing the static aspeed and the instance-specific aspeed (ISA), the average schedule size of aspeed is rather large since aspeed has to compute a single static schedule that is robust across all training instances and not only on a small subset. Surprisingly, the values of k for ISA and TSunny differ a lot, indicating that the best value of k depends on the scheduling strategy.

6 Conclusion and Discussion

We showed that per-instance algorithm scheduling systems can perform as well as algorithm selectors and even establish new state-of-the-art performance on 3 scenarios of the algorithm selection library [3]. Additionally, we found that the performance of the algorithm schedules strongly depends on the adjustment of their parameters for each scenario, here the neighborhood size of the k-nearest neighbor and the maximal size of the schedules.

In our experiments we did not tune all possible parameters of Sunny and ISA in the flexible flexfolio framework; e.g., we fixed the pre-processing strategy of the instance features. Therefore, a future extension of this line of work would be to extend the search space of the automatically-configured algorithm selector AutoFolio [9] to also cover per-instance algorithm schedules. Another extension could be to allow communication between the algorithms in the schedule [10].