1 Introduction

The experimental studies comparing different algorithms on a set of problem instances usually report that while a particular algorithm works well on a group of instances, it fails to outperform the competing algorithms on the other instances. In other words, there is no one algorithm that always perform the best, as also referred in the No Free Lunch (NFL) theorem [2]. Algorithm Selection [3] is an automated way of aiming at choosing the (near-) best algorithm(s) for solving a given problem instance. Thus, the target instances can be solved with the help of multiple algorithms rather than just one. In that respect, algorithm selection can offer an opportunity of defeating any given problem specific algorithm designed by the domain experts, as in the SAT competitionsFootnote 1.

A traditional algorithm selection approach derives performance prediction models [4] that can tell the performance of a given set of algorithms on a new problem instance. For generating such models, the performance of these algorithms on a suite of training instances should be known. Besides, a set of features that can effectively characterize the target problem’s instances is essential. Then, an algorithm selection model can simply map these instance features to the algorithms’ performance. However, it can be challenging to generate the performance data in the first place. Finding representative instance features can also be complicated depending on the available problem domain expertise. Regarding the performance data, the main issue is the cost of generating it. Especially, if the computational resources are limited, it could take days, months, even years [5] to generate a single algorithm selection dataset. This drawback can restrict the use of algorithm selection by anyone who would like to perform algorithm selection on a new problem domain.

A collaborative filtering based algorithm selection technique, i.e. ALORS [1], was introduced to resolve this issue to a certain extent. Collaborative filtering [6] is a popular field of recommender systems to predict the interest of a user on an item such as a book or a movie, s/he haven’t seen yet. The prediction process particularly relies on the user’s preferences on multiple items, such as the scores given by her/him on these items. By taking all the other users’ partial preferences into account, the existing partial preference information of this user can be utilized to determine whether s/he will like those items. From this perspective, if two users share similar preferences, their preferences on the unobserved items are also expected to be similar. The ALORS’ collaborative filtering capability comes from the use of matrix completion, which is the performance data in this context. As its name suggests, matrix completion is about filling the unavailable entries of an incomplete matrix. In relation to the matrix completion task, ALORS showed that it can outperform the single best algorithm on varying problems with up to the 90% sparsity. Its matrix completion component also showed success in process mining [7].

The present study focuses on incorporating Active Matrix Completion (AMC) [8, 9] into algorithm selection considering the data generation cost and quality. The AMC problem is defined as: for a given matrix \(\mathcal M\) with unobserved entries, determine new entries to be queried, Q so that \(\mathcal{M}' = \mathcal{M} \cup Q\) carries sufficient information for successful completion compared to \(\mathcal{M}\). Minimizing \(\mid Q\mid \) can also be targeted during querying. AMC is practical to determine the most profitable entries to add into a matrix for improved matrix completion. For this purpose, I apply various matrix completion techniques and together as ensembles (Ensemble Matrix Completion) in the form of AMC to specify the most suitable instance-algorithm entries to be sampled. In [10], a matrix completion method was used to perform both sampling and completion in ALORS. This study extends [10] by considering a set of existing matrix completion methods while examining their combined power as AMC ensembles. For analysis, a series of experiments is performed on the Algorithm Selection library (ASlib)Footnote 2 [5] datasets.

In the remainder of the paper, Sect. 2 gives background information both on algorithm selection and matrix completion. The proposed approach is detailed in Sect. 3. Section 4 provides an empirical analysis through the computational results. The paper is finalized with a summary and discussion in Sect. 5.

2 Background

2.1 Algorithm Selection

Algorithm Selection has been applied to various domains such as Boolean Satisfiability [11], Constraint Satisfaction [12], Combinatorial Optimization [13], Machine Learning [14] and Game Playing [15]. The majority of the methods called algorithm selection is considered working Offline. This means that the selection process is being performed before the target problem (\(\sim \) instance) is being solved. This study is related to the Offline algorithm selection. Among the existing offline algorithm selection studies, SATZilla [11] is known as one of the premier works. It incorporates pre-solvers and backup solvers to address easy and hard instances, respectively. A rather traditional runtime prediction strategy is utilized to determine the best possible algorithm on the remaining instances. Instance features are used together with the performance data are used to generate the prediction models. SATZilla was further extended [16] with cost-sensitive models, as also in [17]. In [18], a collaborative filtering based recommender system, i.e. [19], was accommodated for AS. Hydra [20], inspired from ISAC [21], was developed to apply parameter tuning for generating algorithm portfolios/diverse algorithm sets, using a single algorithm. 3S [22] was studied particularly to offer algorithm schedules for determining which algorithm to run how long on each target problem instance. For this purpose, the resource constrained set covering problem with column generation was solved. In [23], deep learning was applied for algorithm selection when the instance features are unavailable. The corresponding instance files are converted into images for directly being used with the deep learning’s feature extraction capabilities. As a high level strategy on algorithm selection, AutoFolio [24] was introduced to perform parameter tuning to come up with the best possible algorithm selection setting. Some of these AS systems as well as a variety of new designs were competed in the two AS competitions [25], took place in 2015 [26] and 2017Footnote 3 [27]. Algorithm scheduling was particularly critical for the successful AS designs like ASAP [28].

AS has been also studied as the Online operating methods for choosing algorithms on-the-fly while a problem instance is being solved. Thus, it is particularly suitable for optimization. Selection hyper-heuristics [29] mainly cover this field by combining selection with additional mechanisms. Adaptive Operator Selection (AOS) [30] has been additionally referred purely for Online selection.

2.2 Matrix Completion

Although data might be present in different ways, it is usually not perfect. The being perfect relates to quality and availability. Referring to the data quality, the data at hand might be enough in terms of size yet misleading or involving limited useful information. For availability, some data entries might be missing, requiring to have complete data. Thus, in reality, the systems focusing on data might be required to deal with both issues. In recommender systems, the latter issue is mainly approached as the matrix completion problem, for collaborative filtering [6]. The Netflix challengeFootnote 4 [31] was the primary venue made collaborative filtering popular. In the challenge, the goal was to predict missing 99% data from 1% observed entries of a large group of the user scores on many movies. The data was involving the scores regarding the preferences of the users on the target movies. In the literature, the methods addressing this problem are categorized as the memory-based (neighbourhood methods) and model-based (latent factor models) approaches. The memory-based approaches solely depend the available incomplete data to fill the missing entries. The model-based approaches look for ways to generate models that can perform matrix completion. The model-based approaches are known to be effective especially for the high incompleteness cases.

Matrix factorization [32] has been widely handled by the model-based approaches in collaborative filtering. It has been used to extract latent (hidden) factors that can characterize the given incomplete data such that the missing entries could be effectively predicted. A Probabilistic Matrix Factorization (PMF) method [33] that scales linearly with the number of observations was introduced. CofiRank [34] was devised as a matrix factorization method optimizing an upper-bound on the Normalized Discounted Cumulative Gain (NDCG) criterion for given rank matrices. ListRank-MF [35] was offered as an extension of matrix factorization with a list-wise learning-to-rank algorithm. In [36], matrix completion was studied to show its capabilities under noise. As a growing research direction in matrix factorization, non-negative matrix factorization [37, 38] has been studied mainly to deliver meaningful and interpretable factors.

Related to the present work, active learning has been incorporated into matrix factorization [8]. An AMC framework was proposed and tested with various techniques in [8]. In [39], AMC was studied to address the completion problem for the low-rank matrices. An AMC method called Order&Extend was introduced to perform both sampling and completion together.

3 Method

A simple data sampling approach for algorithm selection, exemplified in Fig. 1, was investigated in [10]. The idea is to apply matrix completion to predict both cheap and informative unobserved entries. As detailed in Algorithm 1, a given completion method \(\theta \) is first applied. Then, for each instance (matrix row), at most n number of entries to be sampled are determined. n is set for maintaining balanced sampling between the instances. This step is basically performed by running the algorithms on the corresponding instances based on their expected performance. The top n, expectedly, the best missing entries are chosen for this purpose. Considering the target dataset with the performance metric of runtime, the best entries here also mean that the ones requiring the least computation effort for sampling. After running those selected algorithms for their target instances, a partial matrix is delivered, assuming that some entries are initially available.

Fig. 1.
figure 1

An active matrix completion example

For evaluation, the matrix is needed to be completely filled, following this active matrix completion sampling step. As in [1, 10], the completion is carried out using the basic cosine-similarity approach, commonly used in collaborative filtering. The completed matrix can then be utilized to determine the best/strong algorithm on each instance that is partially known through the incomplete, observed entries. Besides that, it can also be used to perform traditional algorithm selection which additionally requires instance features. The instance features are ignored as the focus here is solely on generating sufficiently informative algorithm selection datasets.

figure a

The same completion approach is followed yet with different matrix completion methods as the sampling strategies, as listed in Table 1. The first five algorithmsFootnote 5 (\(\vartheta \)) perform rather simple and very cheap completion. Referring to [1], more complex approaches, in particular the model-based ones or the ones performing heavy optimization, are avoided. Besides these methods, three simple ensemble based approaches, i.e. MeanE, MedianE and MinE using \(\vartheta \), are implemented. Finally, pure random sampling (Random) from [1] and the matrix completion method (MC) [10] used in the same study are utilized for comparison.

Table 1. Matrix completion algorithms, \(\theta \) (all in \(\vartheta \) with their default values from fancyimpute 0.1.0, \(\vartheta \) = {SimpleFill, KNN, SoftImpute, IterativeSVD, MICE})

4 Computational Results

The computational experiments are performed on 13 Algorithm Selection library (ASlib)Footnote 6 [5] (Table 2) datasets. The performance data of each ASlib dataset is converted into a rank form as in [1]. For each dataset, AMC is utilized to sample data for the decreasing incompleteness levels, from 90% incompleteness to 10% incompleteness. As the starting point, AMC is applied to each dataset after randomly picking the first 10% of its entries. AMC is tested by gradually sampling new 10% data. For the sake of randomness, initial 10% random sampling is repeated 10 times on each dataset.

Table 2. The ASlib benchmarks, with the number of CPU days required to generate
Table 3. Average time gain achieved in hours by MICE compared to Random (0.9 is ignored since it is the initial random sampling for both methods)

Figure 2 reports the performance of each active matrix completion method in terms of time which is the data generation cost. Time (\(t_{C}\)) is normalized w.r.t. picking out the least costly (\(t_{C_{best}}\)) and the most costly (\(t_{C_{worst}}\)) samples for the given incompleteness levels, as in \(\frac{ t_{C} - t_{C_{best}} }{ t_{C_{worst}} - t_{C_{best}} }\). For the SAT12 datasets, all the methods can significantly improve the performance of random sampling (Random) by choosing less costly instance-algorithm samples. For instance, adding the 10% performance data after the initially selected 10% for SAT12-RAND with MICE can save \(\sim \)565 CPU hours (\(\sim \)24 days) compared to Random. The time gain can reach to \(\sim \)1967 CPU hours (\(\sim \)82 days) (Table 3), on PROTEUS-2014. Significant performance difference is also achieved for the SAT11 and ASP-POTASSCO datasets.

For PROTEUS-2014, MC, SimpleFill and KNN start choosing costly samples after the observed entries exceed 60% of the whole dataset. The remaining algorithms show similar behavior when the observed entries reaches 80%. For MAXSAT12-PMS, only SimpleFill, MICE and MeanE are able to detect cheap entries while the rest consistently picks the costly entries. On CSP-2010, PRE.-ASTAR-2015 and QBF-2011, the methods query the costly instance-algorithm pairs. The common characteristic between these three datasets is having a limited number of algorithms: 2, 4 and 5. The reason behind this issue is that for matrix completion, having high-dimensional matrices can increase the chance of high quality completion. For instance, in the aforementioned Netflix challenge, the dataset is composed of \(\sim \)480K users and \(\sim \)18K movies. Since both dimensions of the user-movie matrix are large, having only 1% of the complete data can be more practical in terms of matrix completion than a small scale matrix. Still, choosing costly entries can be helpful for the quality of the completion due to having misleading observed entries. Referring to the performance of the ensemble methods, despite the success of MeanE, MinE delivers poor performance while MedianE achieves average quality performance compared to the rest. The performance of MC, as the method of running the exact same completion method for both sampling and completion, it comes with the best performance on SAT12-INDU and SAT12-RAND together with the other matrix completion methods. However, it shows average performance on the remaining datasets except CSP-2010 where it delivers the worst performance.

Fig. 2.
figure 2

Normalized time ratio (smaller is better) for varying incompleteness levels, \(0.1 \rightarrow 0.9\) (0.9 refers to the initial 10% data case)

Fig. 3.
figure 3

Average rank (smaller is better) for varying incompleteness levels, \(0.1 \rightarrow 0.9\) (0.9 refers to the initial 10% data case)

Figure 3 presents the matrix completion performance in terms of average ranks, following each AMC application. The average rank calculation refers to choosing the lowest ranked algorithm for each instance w.r.t. the filled matrix, then evaluating its true rank performance. For SAT12-RAND, all the methods significantly outperform Random except KNN and MinE which provide similar performance to Random. SimpleFill, IterativeSVD, MC, MICE and MeanE particularly shows superior performance compared to Random. On SAT12-INDU, the results indicate that especially SimpleFill, MC, MICE and MeanE are able to provide significant performance. Yet, when the observed entries reach 60%, their corresponding rank performance starting to degrade. For SAT12-HAND, the significant improvement in terms of time doesn’t help to reach to outperform Random. However, IterativeSVD, MICE, MeanE and MedianE are able to deliver similar performance to Random when the observed entries is below 50%. For the ASP-POTASSCO, PROTEUS-2014 and SAT11 datasets, similar rank performance can be seen for varying matrix completion methods. The majority of the methods is able to catch the performance of Random throughout all the tested incompleteness levels. On the CSP-2010 dataset, significantly better rank performance is achieved compared to Random yet as mentioned above more computationally costly samples are requested. Apart from preferring the expensive entries, the results indicate that these samples are able to elevate the quality of the matrix completion process. On the remaining datasets with the relatively small algorithm sets, i.e. PRE.-ASTAR-2015 and QBF-2011, similar performance is achieved against Random. However, it should be noted that this average rank performance is delivered with more costly entries than Random. For the ensemble methods, their behavior on the cost of generating the instance-algorithm data is reflected to their average rank performance. Similar to the MC’s average cost saving performance on the data generation/sampling, its rank performance is either being outperformed or matched with the other tested matrix completion techniques.

Fig. 4.
figure 4

Overall rank of the matrix completion methods as AMC, based on (a) normalized time ratio and (b) average rank

Figure 4 shows the overall performance of AMC across all the ASlib datasets with the aforementioned incompleteness levels. In terms of generating cheap performance data, MICE comes as the best approach, followed by an ensemble method, i.e. MeanE. Random delivers the worst performance. MinE, another ensemble method, also performs poorly together with KNN. For their average rank performance, MICE and MeanE come as the best performing methods. The overall rank performance of KNN and MinE are even worse than Random. Besides that, MC performs similarly to Random.

5 Conclusion

This study applies active matrix completion (AMC) to algorithm selection for providing high quality yet computationally cheap incomplete performance data. The idea of algorithm selection is about automatically choosing algorithm(s) for a given problem (\(\sim \)instance). The selection process requires a set of algorithms and a group of problem instances. Performance data concerning the success of these algorithms on the instances plays a central role on the algorithm selection performance. However, generating this performance data could be quite expensive. ALORS [1] applies matrix completion to perform algorithm selection with limited/incomplete performance data. The goal of this work is deliver an informed data sampling approach to determine how the incomplete performance data is to be generated with the cheap to calculate entries while maintaining the data quality. For this purpose, a number of simple and fast matrix completion methods is utilized. The experimental results on the algorithm selection library (ASlib) benchmarks showed that AMC can provide substantial time gain on generating performance data for algorithm selection while delivering strong matrix completion performance, especially for the datasets with large algorithm sets.

The follow-up research plan covers investigating the effects of matrix completion on cold start which is the traditional algorithm selection task, i.e. choosing algorithms for unseen problem instances. Next, the problem instance features will be utilized for further improving the AMC performance. Afterwards, algorithm portfolios [43] across many AMC methods will be explored by extending the utilized AMC techniques. Finally, AMC will be targeted as a multi-objective optimization problem for minimizing the performance data generation time while maximizing the commonly used algorithm selection performance metrics.