1 Introduction

Graphical Models can concisely represent highly dimensional multivariate functions using a factorization into local functions. We consider discrete variables.

Constraint Networks and weighted variants such as Cost Function Networks (CFNs), aka (Weighted) Constraint Satisfaction Problems ((W)CSPs), aim at finding an assignment of all variables that minimizes a joint cost function defined as the sum of local functions (constraints being represented as functions with values in \(\{0,\infty \}\)). With Boolean variables, and a language restricted to clausal form, the (partial weighted Max)-SAT problem has the same target. Constraint Programming (CP), an extension of Constraint Networks including non-deterministic programming language features, can also easily capture these optimization problems by introducing cost variables [43].

In AI and statistics, probabilistic graphical models [29] use the same idea to concisely represent probability distributions over random variables. These models include Bayesian Networks and Markov Random Fields (MRFs). The problem of identifying a variable assignment that has maximum probability is called the Maximum Probability Explanation in Bayesian networks, or Maximum A-Posteriori (MAP) in MRF. This NP-hard problem has an extremely large application scope, e.g., in image processing or bioinformatics. By a simple \((-\log )\) transformation, these problems can be reduced to CFNs.

Graphical Models can also be easily encoded as 0/1 Linear Programming (01LP) problems, a standard language for Operations Research (OR). We consider two encodings, including one based on the so-called local polytope [20, 30, 47], which has several interesting properties.

In this paper, we extract probabilistic and deterministic graphical models from various areas, each using a specific language. This covers competitions in MaxSAT, constraint programming, probabilistic inference and repositories in probabilistic image processing and cost function networks. We encode them in these underlying languages and close relatives, from AI (CFN, MaxSAT, MRF), CP, and OR (01LP). These benchmarks are traditionally used in competitions relying on a single language with dedicated solvers. We compare exact solvers which are all state-of-the-art for their own language on these encodings. We then define a novel portfolio hybrid solver exploiting them.

2 Combinatorial optimization languages

In this section we briefly describe the combinatorial optimization languages that will be used.[CFN] Cost Function Networks, or Weighted Constraint Networks, extend Constraint Networks by using non-negative cost functions instead of constraints [38].

Definition 1

A Cost Function Network (CFN) is a triple (X,W,k) where X={1,…,n} is a set of n discrete variables, W is a set of non-negative functions, and k, a (possibly infinite) maximum cost. Each variable iX has a finite domain D i of values that can be assigned to it. A function w S W, with scope \(S\subseteq X\), is a function \(w_{S}: D_{S} \mapsto \{\alpha \in \mathbb {N}\cup \{k\} : \alpha \leq k\}\), where D S denotes the Cartesian product of all D i for iS.

In CFNs, the cost of a complete assignment is the sum of all cost functions. A solution has cost less than k. Therefore a cost of k denotes forbidden assignments, used in hard constraints. A solution of minimum cost is sought.[MRF] Markov Random Fields define a joint probability distribution. The terminology of Graphical Models (GMs) originally designates probabilistic graphical models such as Markov Random Fields (MRFs) and Bayesian Networks (BNs) [29]. In this paper, we restrict ourselves to MRFs because they do not impose any restriction on the local functions that can be used in the decomposition of the joint probability distribution (BNs use local conditional probabilities with a normalization requirement).

Definition 2

A discrete Markov Random Field (MRF) is a pair (X,Φ) where X={1,…,n} is a set of n random variables, and Φ is a set of potential functions. Each variable iX has a finite domain D i of values that can be assigned to it. A potential function ϕ S ∈Φ, with scope \(S\subseteq X\), is a function \(\phi _{S}: D_{S} \mapsto \mathbb {R}\cup \{\infty \}\).

The probability of a tuple tD X is defined as:

$$P(t) \propto \prod\limits_{\phi_{S}\in {\Phi}} \exp(-\phi_{S}(t[S])) = \exp(-\sum\limits_{\phi_{S}\in {\Phi}} \phi_{S}(t[S]))$$

where t[S] denotes the restriction of t to the set of variables S. The additive potentials ϕ S are called energies, in relation with statistical physics. Alternatively, multiplicative \(\exp (-\phi _{S}(t[S])\) potentials can be used.

In this paper, we consider the MAP query that aims at finding a complete assignment of maximum probability (or equivalently, minimum energy).[WPMS] Weighted Partial MaxSAT problems are CFNs restricted to Boolean domains and a language of weighted clauses [36].

Definition 3

A Weighted Partial MaxSAT (WPMS) instance is defined as a set of pairs 〈C,w〉 and an upper bound k. Each C is a clause and w is a number in \(\mathbb {N} \cup \{ k\}\), the weight of clause C. A clause is a disjunction of literals. A literal is a Boolean variable or its negation.

A clause with weight ≥k is a hard clause, otherwise it is soft. The objective is to find an assignment to the variables appearing in the clauses that minimizes the sum of the weights of all falsified clauses, which should be of cost <k.[01LP] A 0/1 Linear Program is defined by a linear objective function over a set of 0/1 variables to minimize under a conjunction of linear equalities and inequalities [25].[CP] Constraint Programming problems are defined by a set of discrete variables and a set of constraints. The aim is to minimize the value of a given objective variable while satisfying all constraints [45].

3 Translations between formalisms

In this section we present encodings between graphical models represented in each of the AI/OR/CP languages we presented. We summarize in Table 1 for each input formalism the different translations used to produce every instance in the corresponding output formalism.

Table 1 Summary of translations between formalisms and possible issues

[MRF] Markov Random Field.

With additive potentials, MRFs are essentially equivalent to CFNs except for the fact that they can use arbitrary real-valued potential functions instead of integer non-negative costs.Footnote 1 Additive MRFs can therefore be reduced to CFNs using a fixed decimal point representation of energies which are then scaled to integers and shifted to enforce non-negativity. This preserves optimal solutions.

Multiplicative MRFs can be transformed to additive MRFs using a simple \((-\log )\) transform, and then to CFNs [17, 18]. Conversely, CFNs can be transformed to multiplicative MRFs (as in the uai MARKOV format) by exponentiating costs.Footnote 2 Costs are all shifted by the same amount so that the largest multiplicative potentials are equal to 1. Hard costs (≥k) are translated to a zero multiplicative potential (infinite energy) to preserve the ability to prune domain values based on constraint reasoning.

[WPMS] Weighted Partial MaxSAT.

As weighted partial MaxSAT is a CFN with Boolean variables and a language of clauses, thus a WPMS instance is already a CFN. For a CFN, we consider two encodings to WPMS based on CSP to SAT encodings: the direct encoding [6], and the tuple encoding encoding introduced by Bacchus [7]. WPMS costs are non-negative integers and the wcnf format allows to express an upper bound that will be used to represent k, preserving the ability to prune.

Direct encoding: :

for each variable i with domain size |D i |>2, we use one proposition d i,r for each value rD i . This proposition is true iff variable i is assigned the value r. To ensure that exactly (At Least and At Most One) one value is used for each variable, we encode At Most One (AMO) with hard clauses (¬d i,r ∨¬d i,s ) for all i∈{1,…,n} and all r<s, r,sD i , as well as At Least One (ALO) with one hard clause \((\bigvee _{r} d_{i,r})\) for each i. Boolean variables are directly encoded as propositions and do not require AMO/ALO clauses. Then, for each cost function w S W and each tuple tD S with w S (t)>0, we have a clause \((\bigvee _{i \in S} \neg d_{i,t[i]})\) with weight w S (t), where d i,t[i] denotes the proposition associated with assigning to variable i the value that it has in tuple t.

Tuple encoding: :

it encodes domains as in the direct encoding. We have a proposition d i,r for each variable/value pair representing i = r, along with AMO/ALO clauses that enforce that each variable is assigned exactly one value (for non-Boolean variables). Nullary and unary cost functions are also represented as soft clauses exactly as in the direct encoding.

For each cost function w S ,|S|>1, and each tuple tD S with w S (t)<k we have a proposition p S,t . For non-zero cost w S (t)>0, we have the soft clause (¬p S,t ) with weight w S (t). This represents the cost to pay if the tuple t is used. For every variable iS, we have a hard clause (d i,t[i]∨¬p S,t ). These clauses enforce that if tuple t is used, its values t[i] must be used. Then, for each variable iS and each value rD i , we have hard clauses \((\neg d_{i,r} \lor \bigvee _{t\in D_{S},t[i]=r,w_{S}(t)<k} p_{S,t})\) that enforce that if a value rD i is used, one of the allowed tuples tD S such that t[i] = r,w S (t)<k must be used.

On CSP, it is known that Unit Propagation (UP) on the tuple encoding enforces arc consistency in the original CSP (the set of values that are deleted by enforcing AC have their corresponding literals set to false by UP) [7].

We express the asymptotic complexities of the two encodings in terms of the total number of tuples of cost 0 (t 0), k (t k ) or other (t r ) in the problem. For the direct encoding, this is O(n d 2 + t k + t r ), while for the tuple encoding this is O(n d 2 + a(t 0 + t r )), where n is the number of variables, d is the maximum domain size, and a is the maximum cost function arity. The hidden big-O constants are larger for the tuple encoding, which has an additional linear factor a. In our experiments (see Table 2 in Section 4.1), we found that the tuple encoding is typically much larger, more than can be accounted for by the hidden constants. Hence it appears that our benchmark instances have many more tuples with zero cost than with infinite (k) cost (t 0>>t k ).

Table 2 Number of instances and their total compressed (gzipped) size per format for each benchmark resource

[01LP] 0/1 Linear Programming.

The 01LP encodings of CFNs are similar to those for WPMS, using 0/1 variables. The additional expressivity of linear constraints enables further simplifications. These translations are used to generate 01LP in cplex “LP” format.

Direct encoding: AMO/ALO clauses are replaced by one linear constraint per non-Boolean variable iX: \({\sum }_{r\in D_{i}} d_{i,r} = 1\). For each cost function w S , the soft clause encoding of a tuple t with non-zero soft cost 0<w S (t)<k is replaced by a linear constraint \({\sum }_{i \in S} (1-d_{i,t[i]}) + p_{S,t} \geq 1\) that forces the value of p S,t to 1 if the tuple t is used. This p S,t variable appears in the objective function, with a coefficient w S (t). If t has cost k or above, a constraint \({\sum }_{i \in S} (1-d_{i,t[i]}) \geq 1\) is used and no term appears in the objective function.

Tuple encoding: the same encoding as above is used for domains and for zero and unit-arity cost functions. For each cost function w S ,|S|>1, for each variable iS, each value rD i , a constraint \({\sum }_{t\in D_{S},t[i]=r,w_{S}(t)<k}p_{S,t} = d_{i,r}\) enforces that a value (i,r) is used iff a tuple t s.t. t[i] = r and w S (t)<k is used. The same 0/1 variable p S,t appears in the objective function with a w S (t) coefficient if 0<w S (t)<k.

This encoding has been proposed by Koster in [30] to encode Partial Constraint Satisfaction Problems. Since all d i,r are 0/1 variables, the constraints enforce that the p S,t are also integral. We therefore relax the integrality constraint on p S,t variables.

Assuming there are no costs in \(\{0,\infty \}\), for each cost function w S , each variable i, and each value rD i , by summing the linear constraints \({\sum }_{i \in S} (1-d_{i,t[i]}) + p_{S,t} \geq 1\) from the direct encoding over all tuples tD S such that t[i] = r, we found:

$$\begin{array}{@{}rcl@{}} M |S| - \sum\limits_{j \in S \setminus \{i\}} \frac{M}{|D_{j}|} (\sum\limits_{s \in D_{j}} d_{j,s}) - M d_{i,r} + \sum\limits_{t\in D_{S},t[i]=r}p_{S,t} & \geq & M\\ M |S| - \sum\limits_{j \in S \setminus \{i\}} \frac{M}{|D_{j}|} (1 ) - M d_{i,r} + \sum\limits_{t\in D_{S},t[i]=r}p_{S,t} & \geq & M\\ M |S| - \frac{M (|S| - 1)}{d} - M d_{i,r} + \sum\limits_{t\in D_{S},t[i]=r}p_{S,t} & \geq & M\\ M (|S|-1) - \frac{M (|S| - 1)}{d} - M d_{i,r} + \sum\limits_{t\in D_{S},t[i]=r}p_{S,t} & \geq & 0\\ \end{array} $$
$$\textrm{Thus,} \sum\limits_{t\in D_{S},t[i]=r}p_{S,t} \geq M (d_{i,r} - (|S|-1) (1 - \frac{1}{d}))$$

with \(d = \max _{j \in S \setminus \{i\}} |D_{j}|\) and M=|D S∖{i}|, the Cartesian product of all domains of S except D i . If |S|=2, then M = d, and \(M (d_{i,r} - (|S|-1) (1 - \frac {1}{d}))\) is either negative (d i,r =0) or equal to 1 (d i,r =1). Therefore, the direct encoding can be seen as a relaxation of the tuple encoding.

The continuous relaxation of the tuple encoding is known in the MRF field as the local polytope [20, 47, 50]. This polytope is interesting for several reasons. First, the dual of the local polytope is exactly the Optimal Soft Arc Consistency (OSAC) LP for CFN described in [11, 12]. This polytope underlies also convergent message-passing bounds [20, 50] used for MRF optimization. Ignoring possible value pruning (by node consistency or substitutability [19]), OSAC and therefore the local polytope bound too, are known to be stronger than any other soft arc consistency [11]. Second, the dual variables of this polytope can be directly interpreted as the amount of cost that is shifted by arc consistency so-called Equivalence Preserving Transformations [13]. Therefore, existing soft arc consistencies that iteratively change blocks of costs can be analyzed as fast incremental approximate Block Coordinate Descent algorithms aiming at solving this dual LP [37]. This result establishes a strong link between 01LP solvers using the local polytope encoding and CFN/MRF solvers using soft arc consistencies or convergent message passing: in absence of pruning, the LP bound will always be at least as strong as the soft arc consistency bounds.

The significance of this connection is further strengthened by a recent result showing that the local polytope (or its dual) are “universal” in the sense that any LP can be translated in linear time in a graphical model whose local polytope has the same optimum as the original LP [44]. Progress in solving this polytope (exactly or approximately by soft arc consistencies or message passing) and in solving a general LP are therefore tightly linked.

[CP] Constraint Programming.

In [43], a translation of CFNs into crisp CSPs has been proposed. In this transformation, the decision variables of the CFN are preserved and every cost function is reified into a constraint whose scope is augmented by one extra variable, representing the assignment cost. This reification of costs into domain variables transforms a CFN in a crisp CSP with more variables and increased arities. Typically, unary and binary cost functions are converted into table constraints of arity two and three respectively. Another extra cost variable encodes the objective function, connected by a sum constraint to all other cost variables. All the cost variables are non-negative integers with the same initial upper bound k as provided in the wcsp format. The same approach applies to WPMSs, using reified Boolean expressions instead of table constraints to encode hard and soft clauses. The resulting CSP models are expressed in the minizinc [39] CP language.Footnote 3

The converse translation of CP models with a cost variable into a CFN (and then MRFs and WPMSs) that does not use cost variables is a complex task.Footnote 4 It requires identifying localFootnote 5 cost functions, starting from the objective variable, while removing intermediate cost variables. We implemented a corresponding prototype in numberjackFootnote 6 [22] reading the low-level flatzinc format [39]. Global constraints are decomposed into ternary cost functions in extension (tables with costs in \(\{0,\infty \}\), see [1]), requiring small input domain sizes.

4 Graphical model evaluation

We have collected a set of benchmarks and performed experiments using state-of-the-art solvers coming from several research areas.

4.1 Collection of benchmarks

To gather an extensive set of benchmarks representing optimization problems from various areas, we collected problems from different sources including deterministic (CFN, MaxCSP, WPMS), probabilistic (MRF, BN), as well as CP collections. Each collection contains several categories of instances, each category corresponding to a specific class of problems.

[MRF]::

the Probabilistic Inference Challenge (PIC) 2011 benchmark setFootnote 7 and the (5-ary) genetic linkage analysis problem [18] from the Uncertainty in Artificial Intelligence (UAI) 2008 EvaluationFootnote 8 were taken in uai MARKOV format with multiplicative potentials. This PIC challenge on approximate inference in probabilistic graphical models is dedicated to a variety of queries and we only considered the MAP/MPE query. We used a subset of the instances available in PIC 2011, excluding Alchemy, CSP, Promedas, and ProteinProtein.Footnote 9 These problems have been translated to CFNs in wcsp format (then to WPMS, 01LP, CP) using a \((-\log )\) transform followed by fixed decimal point representation with 2-digit precision after the decimal point (the precision is constrained by CP solvers that typically accept only 32-bit integers).Footnote 10

[CVPR]::

the Computer Vision and Pattern Recognition (CVPR) OpenGM2 benchmarkFootnote 11 [27] contains binary and ternary MRF instances in hdf5 format with additive potentials. We excluded Brain, Knott, and MatchingStereo/ted-gm instances because of their size (>1G B), and ModularityClustering because it came from outside the computer vision community. ColorSeg, MatchingStereo, PhotoMontage have integer energies, directly defining non-negative costs. For the others, we used 8-digit precision after the decimal point.

[CFLib]::

the CFLibFootnote 12 is a collection of CFN and MaxSAT problems expressed in wcsp format. We extracted problems that are directly available in the wcsp format and further translated them into dedicated minizinc models manually. The extracted benchmarks include combinatorial auctions [35], CELAR/GRAPH radio-link frequency assignment problems [10], Mendelian error correction problems on complex pedigrees [46], computational protein design problems [3] (with 2-digit precision), SPOT5 satellite scheduling problems [8], and uncapacitated warehouse location problems [33, 34].

[MaxCSP]::

all binary CSP categories with table constraints and at least one inconsistent instance (BlackHole, Langford, Quasi-group Completion Problem, Graph Coloring, random Composed, random 3-SAT EHI, and random Geometric, excluding pure random categories) from the CSP 2008 CompetitionFootnote 13 were translated from xcsp2.1/xml format to CFNs (as MaxCSPs) where allowed (resp. forbidden) tuples have zero (resp. unit) cost. We set k=1,000.

[WPMS]::

weighted partial MaxSAT instances coming from the MaxSAT 2013 EvaluationFootnote 14, including crafted MIPLib, DIMACS Max Clique, and industrial WPMS instances, have been directly encoded as CFNs, each clause being encoded as a cost function with just one non-zero cost tuple. Translation to MRF (resp. CP) was restricted to instances with small-arity clauses (resp. with 32-bit costs, excluding the WPMS/Upgradeability category).

[CP]::

we extracted a selection of CFN-decomposable CP problems from the MiniZinc Challenges 2012 &2013.Footnote 15 Only the smallest instances in FastFood, Golomb, and OnCallRostering categories could be decomposed in wcsp format using less than 1GB per instance (resp. 1, 3, and 3 instances per category).

Together, these benchmark resources contain problems offering a large variety in terms of size, maximum arity or domain size and cost range. WPMS and CVPR categories have the highest number of variables (close to 1 million variables for WPMS/TimeTabling, half a million for CVPR/PhotoMontage and ColorSeg). The WPMS benchmark also has the largest arities (a weighted clause on 580 variables appears in Haplotyping). For the other benchmarks, maximum arity varies from 2 to 5. Graph connectivities are usually very small for MRF&CVPR (often based on grid graphs where vertices represent pixels in images) and WPMS benchmarks. MRF/ObjectDetection, CFN/ProteinDesign, MaxCSP/Langford, and CVPR/Matching have complete graphs. MRF/ ProteinFolding has the largest domain size (503 values). Most CVPR instances have very large cost ranges (8-digit precision), whereas MaxCSP instances contain only 0/1 costs. The emphasis between optimization and feasibility also varies a lot among the problems: almost all deterministic GM categories, except MaxCSPs and CFN/CELAR, contain forbidden (k) tuples in their cost functions. On the contrary, probabilistic GMs usually have no forbidden tuples (except for MRF/Linkage and DBN).

Table 2 reports the number of instances per benchmark resource and its gzipped size for the seven formulations. The uai format appears to be the most compact to express local functions as tables. It relies on a complete ordered table of costs which does not require describing tuples whereas the other formats explicitly describe tuples associated to non-zero costs. The price to pay for this conciseness is the inability of the uai format to represent large arity functions with a few non-zero costs (such as large weighted clauses). As seen before, the tuple encoding is usually larger than the direct one, except for MRF/CVPR lps where the local polytope is a good choice since there are almost no zero costs. CP instances benefit from global constraints in the minizinc language, which are decomposed in large tables in the other formats.

4.2 Experimental settings

We compared state-of-the-art MRF solversFootnote 16 daooptFootnote 17 [42] (using its 1-hour settings), winner of PIC 2011, and toulbar2Footnote 18 [18, 34] (including Virtual Arc Consistency (VAC) as preprocessing [11], dominance rule pruning [19], and hybrid best-first search [2]), winner of MaxCSP 2008 and UAI 2010 & 2014 Evaluations, against WPMS maxhsFootnote 19 solver [14, 15], winner of crafted WPMS MaxSAT 2013, the CP solver gecode,Footnote 20 winner of MiniZinc Challenges 2012, and IBM-ILOG cplex 12.6.0.0 (using parameters EPAGAP, EPGAP, and EPINT set to zero to avoid premature stop).

All computations were performed on a single core of AMD Opteron 6176 at 2.3 GHz and 8 GB of RAM with a 1-hour CPU time limit.Footnote 21

4.3 Experimental results

The number of instances solved in less than 1 hour, excluding translation times between formats, is available in Table 3. Resource-based cactus plots are shown in Fig. 1.Footnote 22 Beyond the number of problems solved and the mean CPU time on solved instances reported in this table, we refine our analysis in two ways. First, we summarize the evolution of lower and upper bounds for each algorithm over all instances in Fig. 2.

Table 3 Number of problems solved in less than 1 hour (N/A if RAM usage or 32-bit limit prevented encoding). In parentheses, mean CPU time in seconds on solved instances (’-’ if none). Bold is best. The first column contains the category name followed by s: nb. of instances, d: max. dom. size, a: max. arity
Fig. 1
figure 1

Cactus plots for MRF, CVPR, CFN, MaxCSP, WPMS, and CP benchmark resources

Fig. 2
figure 2

Normalized lower and upper bounds on all instances (left) and a set of 1208 hardest instances (right)

Specifically, for each instance I we normalize all costs as follows: the initial lower bound produced by toulbar2 (before VAC) is 0; the best – but potentially suboptimal – solution found by any algorithm is 1; the worst solution is 2. This normalization is invariant to translation and scaling. Additionally, we normalize time from 0 to 1 for each pair of algorithm A and instance I, so that each run finishes at time 1. This time normalization is different for different instances and for different algorithms on the same instance. A point 〈x,y〉 on the lower bound line for algorithm A in Fig. 2 means that after normalized runtime x, algorithm A has proved on average over all instances a normalized lower bound of y and similarly for the upper bound. We show both the upper and lower bound curves for all algorithms evaluated here, except gecode which produces no meaningful lower bound before it proves optimality. In order for the last point of each curve to be visible, we extend all curves horizontally after 1.0. Additionally, on the right of Fig. 2, we show the same curves but excluding instances that took less than 5 seconds to solve with a simple version of toulbar2 that does not use either VAC preprocessing or hybrid best first search, for a final set of 1208 instances. We remove those easy instances because the runtime tends to be dominated by whatever preprocessing technique each solver uses. This means that the optimal solution is reported near the end of the search, although it is early in absolute terms.

Note that this plot highlights different aspects of the solvers’ behavior than the cactus plots and should be interpreted in conjunction with those.

In the second part of our analysis, we compute global measures that try to compensate for the very different cardinalities of the categories. For each instance, we compute two Z-scores,Footnote 23 one for the CPU time and another for the cost of the best solution found at the deadline. In the extreme case where a solver is the only one able to solve an instance (resp. is not able to solve it), we use a score of −4 (resp. 4). A mean Z-score is then computed for each category and the sum of all mean Z-scores is reported in Table 3.

To take into account the CPU time and cost in a common measure, we also computed Borda scores, following the MiniZinc Challenge’s approach. For each instance, and each pair of solvers, a reward in [0,1] is granted to each solver as follows: if a solver reports a better cost than the other, it is granted a reward of 1 (and 0 for the other). For identical costs, if t 0 and t 1 are the CPU time for two solvers denoted 0 and 1, the solver i will receive a reward of \(\frac {t_{|i-1|}}{t_{0}+t_{1}}\), favoring the fastest solver. A mean Borda score is computed for each category and the sum of mean scores reported.

The tuple encoding and CP approaches are not applicable to WPMS and CVPR, respectively. Quite fairly, these measures penalize these approaches for this limitation. To see if this penalty was enough to explain the scores of these approaches, we also report the Borda score normalized by the number of applicable categories (this optimistically assumes that these approaches would work as well on these inaccessible benchmarks as on the rest of the benchmarks). The only change is a swap of the order between cplex with the tuple encoding and maxhs with the direct encoding.

As expected, for each source of benchmarks, the best solver (in terms of number of solved instances) is usually a solver that is dedicated to this type of problems (i.e., toulbar2 for CFN, maxhs for WPMS, gecode for CP). However, some solvers, such as maxhs, cplex, and toulbar2, performed well on several resources, respectively solving to optimality 2,043, 2,313, and 2,433 instances among a total of 3,026 (using the best encoding on each category for maxhs and cplex). Using the number of solved instances per category, breaking ties by best mean CPU time on solved instances, these three solvers won the first position on 9, 10, and 16 categories respectively, among 43 categories. Looking at cactus plots in Fig. 1, toulbar2 and cplex dominate on MRF&CVPR, followed by daoopt. They also dominate on CFN, followed by maxhs. toulbar2 performed well on MaxCSP. maxhs and gecode dominate on their own category (resp. WPMS and CP).

In terms of extreme size and solving difficulty, the CVPR/ColorSeg/colseg-cow4 instance defines the largest search space (d n=2829,440) completely solved by toulbar2. MRF/ObjectDetection is the smallest totally unsolved category (d n≤2264). We now consider each benchmark resource, highlighting unexpected results.

[MRF]: :

on MRF/Linkage [28] (maximum number of variables n=1289, maximum domain size d=7), cplex t u p l e , followed by maxhs, got the best results, showing their suitability for non-binary (max. arity a=5) cost functions with forbidden tuples. The tuple encoding is the only 01LP encoding usually considered in MRFs. Surprisingly, cplex with the direct encoding was the best on the Grid category (n=6400,d=2), benefiting from a large number of zero-half cuts. daoopt did not perform as well as for the PIC 2011 Evaluation. One explanation is the missing problem reformulation feature used in the PIC challenge [42], a piece of code which is not available in source or binary format. Another explanation is that daoopt spends more time finding good upper bounds (using local search in preprocessing) than on the optimality proof (as Fig. 2 seems to show). CP solvers performed poorly on MRFs due to the large costs, resulting in huge domains for cost variables in the CFN-to-CP translation.

[CVPR]: :

on CVPR/Scene Decomposition, using a superpixel model [27] with fewer variables (n=208,d=8), toulbar2 solved all 715 instances in 0.07 second each on average compared to 1.11 for cplex t . The good performance of toulbar2 on CVPR instances is largely due to its virtual arc consistency initial problem reformulation [11]. On these problems, it offers a tight lower bound in much less time than LP on the tuple encoding. This encoding was always better for cplex, consistent with the ubiquity of the local polytope formulation as a linear relaxation for MRFs. The tuple encoding also improved the performance of the MaxSAT solver (see maxhs t results in Table 3) on two categories (ProteinInteraction, SceneDecomp).

[CFN]: :

toulbar2 clearly dominates on CELAR (n=458,d=44), Pedigree (n=10017,d=28), and ProteinDesign (n=18,d=198), whereas cplex with direct encoding, followed by maxhs, performed the best on Operations Research problems Auction (n=246,d=2) and Warehouse (n=1100,d=300). The 01LP tuple encoding still performed quite well when the problem size remains relatively small (n×d≤20,000), otherwise memory errors sometimes occurred, as on the largest Warehouse instances (capa-b-c-m). gecode performed relatively well on Auction and Warehouse, solving three large instances (capmo-3-4-5 with n=200,d=100).

[MaxCSP]: :

maxhs performed well on MaxCSP due to its ability to quickly solve all the satisfiable (zero cost optimum) instances that remained in the Geometric (n=50,d=20) and QCP (n=264,d=9) categories, thanks to its embedded minisat solver. The good results obtained by daoopt can similarly be explained by its initial stochastic local search procedure [42], finding good initial upper bounds especially on satisfiable or random instances like EHI (n=315,d=7) and Geometric. toulbar2 won the first position on four MaxCSP categories, especially on EHI random category, thanks to its new hybrid best-first search strategy [2] which simulates restarts with memory. Surprisingly, the tuple encoding was always dominated by the direct encoding here.

[WPMS]: :

large clause arities make the tuple encoding or the use of exhaustive tables in uai format space intractable. While maxhs dominates the scene, it is interesting to notice the ability of cplex to outperform maxhs on two categories (Upgradeability and PackupWeighted). In PackupWeighted (n=25554,d=2), cplex can be up to one order of magnitude faster than maxhs. gecode was the fastest solver to find and prove optimality on 11 MaxClique (n=3321,d=2) instances, whereas maxhs won this category by solving 40 instances among 62.

[CP]: :

CP instances are difficult to translate into GMs with local functions and small domains: 10 instances among 35 minizinc instances could not be translated for space reasons. Moreover the translation is not appropriate for LP solvers (linear constraints are decomposed), explaining the poor performance of cplex. Here, gecode performed the best in most of the cases. However, maxhs, performed the best on two categories: Amaze and OnCallRostering. Similarly, daoopt was faster than gecode on the most difficult ParityLearning instance (52_26_6.3). daoopt solved all the instances in preprocessing thanks to its complete bucket/variable elimination [16], with a memory space usage below 529MB (induced width less than 25), smaller than its limits (4GB and i=35-bound) [42].

With either the tuple or direct encoding, cplex was able to be the best in at least one category per benchmark resource (except for CP) showing very good robustness. For probabilistic models, the tuple encoding is the ideal choice since the emphasis is on optimization (essentially no tuple with cost k). In this case, the tuple formulation offers a strong bound, an essential source of pruning. In several cases however, thanks to its incremental soft arc consistencies and strong virtual arc consistency preprocessing, toulbar2 outperformed cplex on such problems. These results can be analyzed in the light of the known relations between LP and soft arc consistencies [11]: thanks to pruning by node consistency and substitutability and to their efficiency and strong incrementality, soft arc consistencies seem capable of outperforming LP by finding a better trade off than LP in the compromise between tightness and computational cost on the local (universal) polytope.

In other benchmarks however, the direct encoding is always preferable. This could be explained by the better conciseness of the encoding on benchmarks with many 0 costs, as shown in Table 2, and to some extent by the lesser pruning of the optimization bound in the presence of hard constraints. This encoding seems essentially ignored in the MRF community.

Overall, this shows that significant speedups can be achieved by exploiting encodings to different optimization languages.

5 Exploitation: a portfolio approach

Solver portfolios [21, 23, 32] aim to exploit this diversity by replacing a single solver with a set of complimentary solvers and a mechanism for selecting a subset to use on a particular problem. By making decisions at an instance specific level, it is possible to make significant performance gains over any of the individual component solvers. Solver portfolios have been highly successful in constraint programming [4, 24, 41], satisfiability [26, 51], MaxSAT [5], and many more fields. For an extensive survey of the wide-range of literature on the algorithm selection problem, we refer the reader to [32].

The majority of modern portfolio approaches employ some form of machine learning to take the role of the selection model. To enable this involves a training phase whereby for a reference set of instances, a domain-specific feature description, a candidate set of algorithms, and a performance metric are defined. Feature descriptions for each instance and performance data of each algorithm on each instance are recorded. The machine learning model is built such that the performance metric is maximized on this training data. Subsequently, to apply this trained model to a new test instance at runtime, first the feature description must be computed and passed to the model to make a solver selection. The chosen solver is then applied to the problem instance.

5.1 Graphical model instance features

To describe graphical model instances, we consider the following feature set: i) the input file size, ii) the CPU time to read the instance, iii) an initial upper bound on the solution, iv) the time to compute the initial upper bound, v) the number of variables, vi) the number of cost functions. The ratio of vii) unary, viii) binary, and ix) ternary cost functions, i.e. the fraction of the total number of cost functions of each arity. x) The ratio of cost functions which have arity 4 or greater. Finally, a number of statistics such as the mean, standard deviation, coefficient of variation, minimum, and maximum for xi) domain size, and xii) cost function arity. By no means does this list constitute a comprehensive list of features for graphical models, nevertheless in initial evaluations these proved effective and have the benefit of being relatively cheap to compute.

Table 4 presents the Gini importances [9]Footnote 24 of the above features according to a decision tree classifier aiming to predict the fastest solver. The most important features are the ratio of binary cost functions, the minimum domain size, and the value of the initial upper bound.

Table 4 Gini importances of features

5.2 Machine learning offline evaluation results

Table 5 presents an offline evaluation of a simple portfolio approach based on 6 solvers from Sec. 4.3. We consider a subset of the benchmarks and the solvers such that all the instances could be translated to all the solvers, i.e., we exclude the WPMS and CP benchmarks, and the gecode solver.

Table 5 Summary of portfolio approaches sorted by decreasing number of problems solved over the 2,564 instances

The portfolio is built using llama [31], with 10-fold stratified cross validation. This involves splitting the dataset into 10-equally sized folds with an equal distribution of the best solver across folds. For brevity, we present results only for the best performing regression, classification, and clustering methods, plus the Random Forest classifier. The Virtual Best Solver (VBS) corresponds to an oracle deciding the best solver for each instance. The table lists the mean (std. dev.) CPU time on the solved instances, the number of instances solved to optimality in less than 1 hour, the number of times each solver was the fastest. In addition, the misclassification penalty shows the contribution of each solver to the portfolio, i.e., the number of instances that were not solved by any other solver, and, where another one solved the instance, the additional CPU time needed by the next best solver. From these statistics alone, it is clear that each of the component solvers (except maxhs t u p l e ) play a valuable contribution to the portfolio both in terms of being able to solve more instances, and reducing the overall CPU time needed. Additionally, each of the portfolio methods are able to outperform the single best solver and close most of the gap to the virtual best solver.

5.3 The UAI 2014 portfolio

A specific portfolio was developed and submitted to the UAI 2014 Inference Competition (MAP task). It was built from five constituent solvers: i) toulbar2, ii) a version of toulbar2 taking a starting solution from an initial run of the incop [40] local search solver, iii) the Message Passing Linear Programming mplp2 solver [48, 49], iv) cplex using the direct encoding, and v) cplex with the tuple encoding. These solvers were selected based on their complementary performances in previous empirical evaluations. Table 6 presents the results of an offline evaluation of this portfolio.Footnote 25

Table 6 Offline evaluation of the UAI 2014 portfolio on 2,564 instances

The effectiveness of this multi-language portfolio was independently verified in the UAI 2014 Inference Competition, achieving two first places in the MAP task under both the 20 and 60 minute timeouts.Footnote 26 Three of the portfolio’s component solvers were submitted to the same competition as independent entries. The two 01LP encodings performed extremely well on certain instances but extremely poorly on the remaining.Footnote 27 Based on the competition’s overall evaluation metric, the cumulative sum of a solver’s rank on each instance, the 01LP encodings did not rank high overall but were the top-ranked solvers in a number of cases. Likewise, the incop+toulbar2 solver was the highest ranked in some cases but ranked in mid-field in many others.Footnote 28 The UAI’14 portfolio solver was highly successful in deciding when to run these solvers or not, achieving first place overall. This independent empirical evaluation supports the findings demonstrated in this paper, that significant speedups can be achieved by exploiting various encodings to related languages.

6 Conclusions

Our empirical results demonstrate the effectiveness of a number of solvers on various graphical model formats, where no single solver consistently dominates the results. Rather, the best solver depends on each problem category, bringing to light the respective strengths, robustness and weaknesses of each solver family. They highlight the efficacy of encoding a problem to a related language and exploiting complementary solving technologies.

We demonstrate that it is possible to exploit these complementary strengths using a portfolio approach, built on this knowledge won the UAI 2014 Evaluation. We hope that our proposed collection of benchmarks, readily available in many formats, will enrich the various competitions in CP, AI, and OR, leading to more robust solvers and new solving strategies.