1.1 Introduction

Mathematical programming models play a vital role in the socio-economic developments of the modern-day society. Optimization frameworks such as linear programming, quadratic programming, combinatorial optimization, and mixed integer programming are effectively used in engineering design, finance, healthcare, economics, medicine, transportation, supply chains, environment, telecommunications among others. Perhaps the most fundamental among the various optimization modeling tools is linear programming. This model building framework, as we know it today, was originated around 1938 and evolved into an applicable modelling technique in the early 1950s with the discovery of the simplex method [14]. Realization of the benefits associated with having integrality restrictions on some or all of the decision variables of a linear program led to the development of integer programming and significant advancements in this area continues to emerge [42].

The literature on unconstrained nonlinear optimization on the other hand, can be traced back to early days of calculus [66]. However, the development of successful methods for solving constrained nonlinear optimization problems took a very long time. Special nonlinear programs of minimizing convex quadratic functions over linear constraints can now be solved very efficiently. Study of constrained quadratic optimization problems started around the mid 1950s with applications in portfolio optimization [56, 57]. Systematic investigations on the quadratic programming problem with integrality restrictions on the decision variables took another decade to initiate [34, 53]. This book addresses a versatile unconstrained quadratic programming model called quadratic unconstrained binary optimization problem (QUBO) where the decision variables take values 0 or 1. The literature on QUBO is quite extensive and for a quick overview, we refer the reader to the survey papers [28, 45, 48].

Before getting into the technical details, let us start with some basic notations. All matrices are represented using bold capital letters and elements of a matrix are represented by the corresponding small letters along with accents, if any, and the location coordinates. For example the (i, j)th element of the matrix A is a ij, of the matrix \(\bar {\mathbf {B}}\) is \(\bar {b}_{ij}\), and of the matrix D k is \(d^k_{ij}\). Similarly, vectors are represented by boldface small letters along with appropriate accents, as applicable, and elements of the vector is represented using the same letter (without boldface) along with its location coordinate and accents, if any. For example, the ith element of the vector c is c i, of the vector x k is \(x^k_i\), and of the vector \(\tilde {\mathbf {v}}\) is \(\tilde {v}_i\). Exceptions to this rule will be stated explicitly and operators such as transpose etc. are not considered as accents in the case of vectors. The zero vector in any dimension is represented by 0. Additional notations will be introduced as need arises.

Let us now present a formal mathematical definition of QUBO. Let Q be an n × n matrix, be a row vector from \(\mathbb {R}^n\), and be a row vector from {0, 1}n. Then, QUBO is defined as the mathematical programming problem:

Since \(x_i^2=x_i\) for any binary variable x i, we can represent QUBO without the linear term by simply adding c i to q ii for i = 1, 2, …, n. Alternatively, we can assume q ii to be zero by replacing c i with c i + q ii. In fact, QUBO is represented in many other alternative forms which are discussed in more detail later in this chapter. QUBO sometimes is presented as a minimization problem, different from the standard maximization form. The data Q, c along with the orientation of optimization completely defines a problem instance. Thus, a maximization instance of QUBO can be represented by the triple \((\mathbf {Q},\mathbf {c},\max )\) and a minimization instance of QUBO can be represented by \((\mathbf {Q},\mathbf {c},\min )\). Note that, the instances \((\mathbf {Q},\mathbf {c},\max )\) and \((-\mathbf {Q},-\mathbf {c},\min )\) are equivalent. Unless otherwise specified, we assume that QUBO is presented in the maximization form and we normally represent such an instance by the ordered pair (Q, c).

Let us now look at an interpretation of QUBO from the point of view of matrix theory. A principal submatrix of Q is a square submatrix obtained by deleting rows and columns corresponding to an index set S ⊆{1, 2, …, n}. The value of a matrix is the sum of its elements. Without loss of generality assume c = 0. Then QUBO is precisely the problem of computing a principal submatrix of Q with maximum value.

A graph theoretic interpretation of QUBO can be given as follows. Let G = (V, E) be a graph such that V = {1, 2, …, n} and (i, j) ∈ E if and only if q ij ≠ 0. The graph G is called the support graph of Q. Let q ij be the cost of edge (i, j) ∈ E and q ii + c i be the cost of the vertex i ∈ V . Let S be a subset of V  and G(S) be the subgraph of G induced by S. The cost of G(S) is defined as the sum of the cost of its edges and vertices. Then the QUBO seeks a subset S of V  such that the cost of the induced subgraph G(S) of G is maximized.

Literature on QUBO, presented as a 0-1 quadratic program, can be traced back to 1960s, particularly with the work of Hammer and Rudeanu [34] on pseudo-Boolean functions and those of [13, 20, 21, 33, 80, 81]. Graph theoretic optimization problems such as the maximum weight stable set problem, the maximum weight clique problem, and the maximum cut problem have a hidden QUBO structure. As we will see later, these graph theoretic optimization problems are indeed equivalent to QUBO. In this sense, the history of QUBO is also linked to the origins of these graph theoretic structures.

A stable set of a graph G = (V, E) is a subset S of V  such that no two vertices of S are adjacent in G. Let d i be a given weight associated with vertex i ∈ V . Then, the maximum weight stable set problem is to find a stable set S in G such that ∑iS d i is maximized. It is easy to formulate the maximum weight stable set problem as a QUBO. Consider the binary decision variables x i for i = 1, 2, …, n where x i = 1 represents the event that vertex i is in the stable set. Then, for any stable set S and (i, j) ∈ E, if the distinct vertices i, j ∈ S then x i x j = 0. This constraint can be forced using a quadratic term in the objective function. Define

$$\displaystyle \begin{aligned} q_{ij}= \begin{cases} d_i & \mbox{ if } i=j \\ -M & \mbox{ if } (i,j)\in E\\ 0 & \mbox{ if } (i,j) \notin E, \end{cases} \end{aligned}$$

where M is a large positive number. Also, choose c as the zero vector. Then, the maximum weight stable set problem can be solved as the QUBO instance (Q, c) where Q and c are as defined above. Interestingly, it is possible to formulate any instance (Q, c) of QUBO as a maximum weight stable set problem and this will be discussed in Sect. 1.4 along with the other equivalent forms of QUBO.

A clique in the graph G is a subset S of V  such that the subgraph G(S) of G induced by S is a complete graph. It is easy to see that S is a clique in G if and only if S is a stable set in the complement of G. Thus, in view of the discussion above, the maximum weight clique problem can also be formulated as a QUBO and it is also equivalent to QUBO.

Thus, from a historical perspective, cliques and stable sets play an indirect role in the evolution of QUBO. The literature on computing maximum weight clique and maximum weight stable set is quite extensive and we will not attempt a detailed review of historical developments of these structures. However, we briefly discuss below some of the earlier works that introduced these notions. Although clique (stable set) is a well established concept in graph theory at present, the terminology has its roots in social sciences [19, 22, 55]. Luce and Perry [55] in 1949 defined a clique as follows:

A subset of the group forms a clique provided that it consists of three or more members each in the symmetric relation to each other member of the subset, and provided further that there can be found no element outside the subset that is in the symmetric relation to each of the elements of the subset.

The first algorithmic results on computing cliques was presented by Harrary and Ross [36]. Without using the name ‘clique’, the concept of complete subgraphs and independent sets (same as stable sets) were considered by Erdos and Szekeres [17] in 1935 while discussing Ramsey Theory in the context of graphs.

QUBO can also be written as a continuous optimization problem [70, 71]. Consider the box-constrained quadratic program

where M is a large positive number and e is the all-one vector in \(\mathbb {R}^n\). Note that for large M, the objective function becomes convex and hence an optimal solution is attained at an extreme point of [0, 1]n [70, 71]. The collection of extreme points of [0, 1]n is precisely {0, 1}n establishing the validity of the above representation of QUBO.

QUBO is strongly NP-hard. A thorough complexity analysis of the problem and various polynomially solvable special cases are discussed in Chap. 3 and complexity results in connection with approximability is discussed in Chap. 8.

1.2 The Ising Model

Long before the QUBO become popular within the operations research community, the model was used in statistical mechanics in an alternative form [75]. This is popularly known as the Ising model where the variables take values from {−1, 1}. The Ising model was originally proposed by Ernst Ising and Wilhelm Lenz in the 1920s to understand the behaviour of magnetic materials. This model considers a magnetic material as a set of molecules. The Molecules have spins which ‘align’ or ‘anti-align’ when a magnetic field is introduced and have pairwise interaction with each other [3, 5]. Let V = {1, 2, …, n} be a collection of molecules and x i ∈{−1, 1} represents the spin of molecule i. Let b i be the strength of the magnetic field applied on molecule i and a ij represents joint interaction field between neighbouring spins of i and j. For a given spin vector x = (x 1, x 2, …, x n), the corresponding energy value is defined as

$$\displaystyle \begin{aligned}E(\mathbf{x})=\sum_{i=1}^n\sum_{j=1}^na_{ij}x_ix_j + \sum_{i=1}^nb_ix_i.\end{aligned}$$

At low temperature levels, the system tends to low energy states and hence the sign of a ij indicates possible spin direction. Thus, when a ij > 0 the system favors anti-aligned neighbouring spins, i.e. x i x j = −1 leaving a ij x i x j < 0. Likewise, when a ij < 0 the spins x i and x j are likely to align; i.e. x i x j = 1 leaving a ij x i x j < 0. This simple hypothetical model has been validated through experimental considerations by physicists for decades [10, 40, 61,62,63]. This underlying model is the basis of some of the current quantum inspired computing machines [5] which essentially brings a system to its lowest energy level which in turn indirectly computes the minimum of the energy function (maximum of − E(x)) over variables that takes values − 1 or 1.

The associated optimization problem, presented in the maximization problem, can be stated as

where A is an n × n real valued matrix and is a row vector from \(\mathbb {R}^n\) and y is a column vector in {−1, 1}n. We refer to this version of QUBO as the Ising QUBO .

The models QUBO and the Ising QUBO are equivalent from an optimality point of view. Note that the linear transformation \(x_i=\frac {1}{2}(y_i+1)\) for i = 1, 2…, n reduces the QUBO (Q, c) to the Ising QUBO (A, b) where \(\mathbf {A}=\frac {1}{4}\mathbf {Q}\), \(b_i=\frac {1}{4}\left (2c_i+\sum _{j=1}^n(q_{ij}+q_{ji})\right )\) for i = 1, 2, …, n along with an additive constant \(q_0=\frac {1}{4}\left (\sum _{i=1}^n\sum _{j=1}^nq_{ij}+\sum _{i=1}^nc_i\right )\), which can be discarded. Similarly, an instance (A, b) of the Ising QUBO can be reduced to an instance (Q, c) of QUBO using the transformation y i = 2x i − 1 for i = 1, 2…, n with Q = 4A, along with the additive constant .

The original Ising model was in fact a special case of the Ising QUBO. There are various extensions and generalizations of this popular model. For a historical account of various developments related to this model, we refer to the survey papers [10, 40, 61,62,63]. Although the name ‘Ising model’ is widely accepted within the physics community, it is not without controversy. For example, Barry Simon in his book on The Statistical Mechanics of Lattice Gases [75] writes:

Lattice models are caricatures invented to illuminate various aspects of elementary statistical mechanics, especially the phenomena of phase transitions and spontaneously broken symmetry. The simplest of all models is the Ising (or Lenz-Ising) model and this model was suggested to Ising by his thesis adviser, Lenz. Ising solved the one-dimensional model, ..., and on the basis of the fact that the one-dimensional model had no phase transition, he asserted that there was no phase transition in any dimension. As we shall see, this is false. It is ironic that on the basis of an elementary calculation and erroneous conclusion, Ising’s name has become among the most commonly mentioned in the theoretical physics literature. But history has had its revenge. Ising’s name, which is correctly pronounced “E-zing,” is almost universally mispronounced “I-zing.”

Such discussions are probably important in physics to put contributions to scientific developments in context. We continue to use the terminology QUBO and Ising QUBO to distinguish between the nature of the underlying variables and to recognize the linkages between the Ising model and Ising QUBO.

For QUBO, we have seen that the linear term can be absorbed into the Q matrix or the diagonal of Q can be extracted and added to the linear term without altering the optimal solutions set. It is possible to discard the diagonal elements of A from the Ising QUBO since these elements simply contribute a constant value to any feasible solution. Thus, it is customary to assume that diagonal entries of A are zeros. Further, it is possible to reformulate Ising QUBO without the linear term [18, 38]. Define the (n + 1) × (n + 1) matrix \(\bar {\mathbf {A}}\) where

$$\displaystyle \begin{aligned} \bar{a}_{ij}= \begin{cases} a_{ij}, & \mbox{ if } 1\leq i,j \leq n \\ \frac{1}{2}b_i & \mbox{ for } j=n+1, i=1,2,\ldots ,n\\ \frac{1}{2}b_j & \mbox{ for } i=n+1 j=1,2,\ldots ,n\\ 0 & \mbox{ for } i=j=n+1. \end{cases} \end{aligned}$$

Now, consider the Ising QUBO,

Note that if y is a solution to the Ising QUBO IQB then −y is also a solution. Further, if both y and −y have the same objective function value in IQB. Thus, without loss of generality we can assume that y n+1 = 1 in an optimal solution to IQB. Now, it can be verified that IQB is equivalent to the Ising QUBO.

The validity of the Ising model is not restricted to physics. Its relevance has been established in various other fields such as finance, biology, psychology, sociology etc. For example, the Ising model can be adapted to study human behavior by interpreting similarity between the behaviour of molecules and the behaviour of human, as argued by Galam [24]. Specific applications of the Ising QUBO from this point of view can be found in [49, 77]. Applications of QUBO and the Ising QUBO are covered in detail in Chap. 8.

1.3 Representations of the Q Matrix

Let denote the objective function of the instance (Q, c) of QUBO. We remove the suffix Q, c from f Q,c when the explicit consideration of Q and c is unimportant. An instance (Q , c ) of QUBO is an equivalent representation of the instance (Q, c) if

  1. 1.

    \(f_{Q,c}(\mathbf {x}) = f_{Q',c'}(\mathbf {x})\) for all x ∈{0, 1}n

  2. 2.

    Q and Q belongs to \(\mathbb {R}^{n\times n}\), c and c belongs to \(\mathbb {R}^n\).

Equivalent representations, although preserving optimality, could generate instances with different structural properties that may be exploited to obtain computational advantages. Let us now look at some simple and commonly used equivalent representations of QUBO [65]. More involved equivalent representations and corresponding applications in computing strong lower bounds are discussed in Chaps. 6 and 7.

Remark 1.1

and are equivalent representations of (Q, c).

The proof of this remark is straightforward. Note that is a symmetric matrix. Thus, it is possible to assume without loss of generality that the matrix Q in the definition of QUBO is symmetric and this assumption is required for some of the algorithms to solve the problem and in computing certain types of lower bounds. For any matrix \(\mathbf {Q}\in \mathbb {R}^{n\times n}\), d i a g V(Q) is the diagonal vector of Q. i.e. d i a g V(Q) is the vector of size n and its ith element is q ii.

Remark 1.2 ([69])

If S is a skew-symmetric matrix, D is a diagonal matrix, Q  = Q + S + D, and c  = c −d i a g V(D), then (Q , c ) is an equivalent representation of (Q, c).

The proof of Remark 1.2 follows from the fact that for any skew-symmetric matrix S and \(x_i^2=x_i\) for all binary variables x i. Choosing S and D appropriately, we can get different standard representations of Q. For example, choose D such that d ii = −q ii for and choose S such that

$$\displaystyle \begin{aligned} s_{ij} = \begin{cases} q_{ji} &\text{if}\ j > i\\ -q_{ij} &\text{if}\ j < i\\ 0 & \text{otherwise}. \end{cases} \end{aligned}$$

Then, the resulting matrix Q is upper triangular with zeros on the diagonal. This is a common representation used in the QUBO literature and is also assumed in some of the chapters in this book. If we choose S as the zero matrix and d ii = M for , where M is a sufficiently large nonnegative number, the resulting matrix Q will be positive semidefinite . Consequently, without loss of generality one may assume Q to be positive semidefinite. When Q is symmetric and not positive semidefinite, choosing M to be the negative of the smallest eigenvalue of Q is sufficient to make Q to be positive semidefinite [32]. This makes the continuous relaxation of the objective function of QUBO a convex quadratic function. In a similar way, choosing M as a sufficiently small negative number, we can get a representation of QUBO where the Q matrix is negative semidefinite and thereby making the continuous relaxation of the objective function of QUBO a concave quadratic function.

A QUBO (Q, c) of size n can be represented as a QUBO (Q , 0) of size n + 1 such that the row sum and column sums of Q are zeros. We call this the zero sum representation. Without loss of generality assume c = 0. From the instance (Q, 0) construct the matrix Q as

$$\displaystyle \begin{aligned} q^{\prime}_{ij} = \begin{cases} q_{ij} &\text{if}\ i,j\leq n\\ -\sum_{k=1}^nq_{ik} &\text{if}\ j=n+1, i\leq n\\ -\sum_{k=1}^nq_{kj} & \text{if}\ i=n+1, j\leq n\\ \sum_{k=1}^n\sum_{\ell=1}^nq_{k\ell}&\text{if}\ i=n+1,j=n+1 \end{cases} \end{aligned}$$

It can be verified that the row and column sums of the matrix Q are zeros. Consequently, the sum of all elements of Q is also zero. If x n+1 is zero in an optimal solution for (Q , 0) then (Q , 0) is equivalent to (Q, 0). Let \(\bar {\mathbf {x}}=\mathbf {e}-\mathbf {x}\) where e is the all one vector in \(\mathbb {R}^{n+1}\).

Lemma 1.1

If row and column sums of Q are zeros, then for all x ∈{0, 1}n+1.

Proof

. The last equality follows from the fact that row and column sums of Q are zeros, which makes and . □

From Lemma 1.1, if x is an optimal solution to (Q , 0), then \(\bar {\mathbf {x}}\) is also an optimal solution. Thus, x n+1 = 0 in one of these optimal solutions of (Q , 0) and hence (Q , 0) and (Q, 0) are equivalent.

1.4 Some Equivalent Optimization Problems

We have seen that the problem of computing the maximum value principal minor of an n × n matrix, the maximum weight induced subgraph problem, and the Ising QUBO are alternative ways of representing a QUBO. In this section, we discuss some additional problems that are equivalent to QUBO and these equivalent forms are used in some of the chapters to follow to derive structural properties and to develop solution algorithms.

1.4.1 QUBO as a Bilinear Program

In this subsection, we assume that Q is symmetric and positive semidefinite . As discussed in the previous section, this assumption is without loss of generality. Then the continuous relaxation of QUBO is to

Since Q is symmetric and positive semidefinite, f(x) is a convex function and hence there exists an optimal solution to QUBO(C) which at an extreme point of the hypercube [0, 1]n. Now consider the hypercube bilinear program

Lemma 1.2 ([51])

There exists an optimal solution (x , y ) to HBLP such that both x and y are extreme points of the hypercube [0, 1]n.

Proof

Suppose that (x 0, y 0) is an optimal solution to HBLP. Now fix y = y 0 in HBLP and let \(\bar {\mathbf {x}}\) be an optimal extreme point solution of the resulting linear program. Then \((\bar {\mathbf {x}},{\mathbf {y}}^0)\) is an optimal solution to HBLP. Next fix \(\mathbf {x}=\bar {\mathbf {x}}\) in HBLP and let \(\bar {\mathbf {y}}\) be an optimal extreme point solution of the resulting linear program. Then \((\bar {\mathbf {x}},\bar {\mathbf {y}})\) is an optimal solution to HBLP and this completes the proof. □

Let us now prove a property of a symmetric positive semidefinite matrix which is used in the proof of the theorem that follows.

Lemma 1.3

If Q is a symmetric positive semidefinite matrix such that then Qx = 0.

Proof

Since Q is symmetric and positive semidefinite, there exists a matrix B such that . Then, . Thus, implies Bx = 0 and hence . □

We now show that QUBO is equivalent to HBLP.

Theorem 1.1 ([50, 51])

If (x , y ) is an optimal extreme point solution of HBLP then both x and y are optimal solutions of QUBO(C). Conversely, if x is an optimal extreme point solution of QUBO(C) then (x , x ) is an optimal solution to HBLP.

Proof

Let (x , y ) be an optimal extreme point solution of HBLP and x 0 be an optimal extreme point solution of QUBO(C). Then,

$$\displaystyle \begin{aligned} f({\mathbf{x}}^o) \geq f({\mathbf{x}}^*) \mbox{ and } f({\mathbf{x}}^0) \geq f({\mathbf{y}}^*). \end{aligned} $$
(1.1)

Further,

$$\displaystyle \begin{aligned} \eta({\mathbf{x}}^*,{\mathbf{y}}^*) &=\max\{\eta(\mathbf{x},\mathbf{y}) : \mathbf{x},\mathbf{y}\in [0,1]^n\}\\ {} &\geq \max\{\eta(\mathbf{x},\mathbf{x}) : \mathbf{x}\in [0,1]^n\}=f({\mathbf{x}}^0) \end{aligned} $$
(1.2)

Since (x , y ) is an optimal solution to HBLP,

$$\displaystyle \begin{aligned} \eta({\mathbf{x}}^*,{\mathbf{y}}^*)-\eta({\mathbf{x}}^*,{\mathbf{x}}^*) \geq 0 \mbox{ and } \eta({\mathbf{x}}^*,{\mathbf{y}}^*)-\eta({\mathbf{y}}^*,{\mathbf{y}}^*) \geq 0 \end{aligned} $$
(1.3)

But

(1.4)
(1.5)

From (1.3), (1.4) and (1.5) we have,

(1.6)
(1.7)

Adding (1.6) and (1.7), we get . Since Q is symmetric and positive semidefinite, from Lemma 1.3, Q(x y ) = 0. Substituting this in (1.6) and (1.7) we get . Thus from (1.4) and (1.5) we have

$$\displaystyle \begin{aligned}\eta({\mathbf{x}}^*,{\mathbf{y}}^*)=\eta({\mathbf{x}}^*,{\mathbf{x}}^*)=\eta({\mathbf{y}}^*,{\mathbf{y}}^*).\end{aligned}$$

But η(x , x ) = f(x ) and η(y , y ) = f(y ). Thus,

$$\displaystyle \begin{aligned} \eta({\mathbf{x}}^*,{\mathbf{y}}^*)=f({\mathbf{x}}^*)=f({\mathbf{y}}^*) \end{aligned} $$
(1.8)

From (1.1) and (1.8), η(x , y ) ≤ f(x 0) and the result follows in view of (1.2) and (1.8). □

1.4.2 The Maximum Cut Problem and QUBO

Let G = (V, E) be a graph with V = {1, 2, …, n} and for each edge (i, j) ∈ E a weight w ij is prescribed. For any S ⊆ V , the pair (S, V ∖ S) is called a cut in G. The weight of the cut (S, V ∖ S) is

$$\displaystyle \begin{aligned}w(S,V\setminus S)=\sum_{i\in S, j\in {V\setminus S}, (i,j)\in E}w_{ij}.\end{aligned}$$

In general, our definition of a cut allows the possibility that S = ∅ or S = V  and in either case w(S, V ∖ S) = 0. The minimum cut problem seeks a cut (S, V ∖ S) with minimum w(S, V ∖ S) value. When w ij ≥ 0 for all (i, j) ∈ E, we normally do not permit S = ∅ or V  for the minimum cut problem since otherwise the trivial solution S = ∅ is optimal. Likewise, the maximum cut problem seeks a cut (S, V ∖ S) with maximum w(S, V ∖ S) value. When w ij ≥ 0 for all (i, j) ∈ E, the minimum cut problem is solvable in polynomial time [44] but the maximum cut problem is NP-hard [23]. When w ij is allowed to take positive and negative values, both minimum cut and maximum cut problems are NP-hard. We now observe that QUBO and the maximum cut problem are essentially the same, presented under different frameworks [7, 33, 38].

Let x ∈{0, 1}n and \(\bar {\mathbf {x}}=\mathbf {e-x}\) where e is the all-one vector in \(\mathbb {R}^n\). An x ∈{0, 1}n is an incidence vector of S ⊆ V  if and only if x i = 1 for all i ∈ S. Let Q be the weighted incidence matrix of G. Define q ij = q ji = w ij if (i, j) ∈ E (i.e. i and j are adjacent in G) and zero, otherwise. Note that Q is symmetric. Then, for any cut \((S,\bar S)\) (with \(\bar S=V\setminus S\)) and the incidence vector x of S, we have . For example, consider the graph below with the associated edge weights and the corresponding matrix Q (Fig. 1.1).

Fig. 1.1
figure 1

An instance of WMCP

Now, choose the cut \((S,\bar S)\) where S = {1, 4} and the incidence vector associated with S. Then, .

Thus, the maximum cut problem can be written as

(1.9)

For calculating , the diagonal elements of Q are irrelevant. For any n × n matrix Q,

(1.10)

where \(r_i= \sum _{j=1}^nq_{ij}\). Define the n × n matrix \(\hat {\mathbf {Q}}=\left (\hat {q}_{ij}\right )\) where

$$\displaystyle \begin{aligned} \hat{q}_{ij} = \begin{cases} -q_{ij} &\text{if}\ i\neq j\\ r_i-q_{ii} &\text{if}\ i=j. \end{cases} \end{aligned}$$

Then , establishing that the maximum cut problem can be formulated as a QUBO. We now observe that any instance, say (Q, c), of QUBO can be formulated as a maximum cut problem. To see this, from Eq. (1.10),

(1.11)

where r = (r 1, r 2, …, r n).

Now, consider the graph G′ = (V, E′) where V = {0, 1, 2, …, n}. The edge set E′ = E ∪ E 0 where E = {(i, j) : q ij ≠ 0;i, j = 1, 2, …, n, i ≠ j} and E 0 = {(0, i) : r i + c i ≠ 0}. Now, define the weight w ij of the edge (i, j) as − q ij if (i, j) ∈ E and − (r j + c j) for (0, j) ∈ E 0. Let \(\alpha = \sum _{i=1}^n(r_i+c_i)\). For any cut \((S,\bar S)\) in G′, without loss of generality assume that the node 0 ∈ S and consider the binary variables x 1, x 2, …, x n with x i = 1 if and only if i ∈ S. Let \(w(S,\bar S)\) be the value of the cut \((S,\bar S)\) in G′ with the edge weight function w. Then, it can be verified that . Thus, any instance (Q, c) of the QUBO with n variables can be formulated as a maximum weight cut problem on a graph with n + 1 nodes.

We now illustrate this construction using the following example. Let

$$\displaystyle \begin{aligned}\mathbf{Q}= \begin{pmatrix} 0 &10& -5& -8& 6\\ 10& 0& 3& 2& -4\\ -5& 3& 0& 5& 0\\ -8& 2& 5& 0& 8\\ 6& -4& 0& 8& 0 \end{pmatrix} \mbox{ and } \mathbf{c} = \begin{pmatrix} -3&\\ -5 &\\11 &\\-3 &\\-2 \end{pmatrix} \end{aligned}$$

Then, and α = 32. The graph G′ constructed following the procedure discussed above, is given in Fig. 1.2. The numbers on the edges are the corresponding weights. Edges with weight zero are removed from the graph.

Fig. 1.2
figure 2

The instance of WMCP equivalent to (Q, c)

The QUBO formulation of the maximum cut problem is also evident from the sum of squares formulation of the maximum cut problem given by

$$\displaystyle \begin{aligned}\max_{S\subseteq V} w(S,V\setminus S)=\max_{\mathbf{x}\in \{0,1\}^n}\sum_{(i,j)\in E}w_{ij}(x_i-x_j)^2.\end{aligned}$$

Another popular formulation of the maximum cut problem is obtained by using variables that take values − 1 or 1. Define y i = 1 if i ∈ S and y i = −1 if i ∈ V ∖ S. Assume that the edges of the graph G are labelled such that (i, j) ∈ E implies i < j and w ij = 0 if (i, j)∉E. Consider the symmetric matrix W. Then,

where L = D −W and D is a diagonal matrix such that \(d_{ii}=\sum _{j=1}^nw_{ij}\) assuming w ij = 0 for (i, j)∉E. Thus, the maximum cut problem can be formulated as the Ising QUBO \((\frac {1}{4}\mathbf {L},\mathbf {0})\).

Let G = (V, E) be a weighted graph with edge weight w ij for each (i, j) ∈ E and V = {1, 2, …, n}. Choose w ij = 0 if (i, j)∉E and define the weight matrix W as the n × n matrix with its (i, j)th element w ij. Let D be the weighted degree matrix which is the diagonal matrix with \(d_{ii}=\sum _{j=1}^nw_{ij}\). Then the weighted Laplacian matrix of G is L = D − W. The matrix L is positive semidefinite and the row and columns sums of the matrix is zero. Note that the matrix L constructed in our reduction from the maximum weight cut problem to the Ising QUBO is precisely the Laplacian matrix of G.

In fact, any Ising QUBO can be written as a maximum cut problem as well. To see this, consider an instance (A, b) of the Ising QUBO in n variables. Without loss of generality, assume that A is symmetric and b is the zero vector. Now, consider the graph G = (V, E) where V = {1, 2, …, n} and E = {(i, j) : a ij ≠ 0}. Choose the weight of the edge (i, j) ∈ E as − a ij and let L be the corresponding weighted Laplacian matrix. Then, A = L + D where D is a diagonal matrix with d ii = −l ii + a ii. For any y ∈{−1, 1}n,

where tr(D) is the trace of D which is a constant. Thus an optimal solution to the Ising QUBO (A, 0) is obtained by solving the Ising QUBO (L, 0) which is precisely the maximum cut problem on G with weights − a ij for (i, j) ∈ E.

1.4.3 Equivalence with the Stable Set Problem

We have seen that the maximum weight stable set problem can be formulated as a QUBO. We now show that any instance of QUBO can be formulated as a maximum weight stable set problem. Our discussion here follows the paper by Hammer, Hansen and Simeone [35]. First, let us rewrite the objective function of a QUBO where all the quadratic terms have non-negative coefficients. After rewriting, the quadratic terms could involve both the original binary variables and their complements. Recall that

where P = {(i, j) : q ij > 0, i ≠ j} and N = {(i, j) : q ij < 0, i ≠ j}. Also, let N j = {i : (i, j) ∈ N} and \(\rho _j=\sum _{i\in N_j}q_{ij}\). Let \(\bar {x}_i=1-x_i\) be the complement of x i. i.e. x i ∈{0, 1} if and only if \(\bar {x}_i\in \{0,1\}\). Now,

$$\displaystyle \begin{aligned} f(\mathbf{x})&=\sum_{(i,j)\in P}q_{ij}x_ix_j +\sum_{(i,j)\in N}q_{ij}(1-\bar{x}_i)x_j+\sum_{i=1}^n(q_{ii}+c_i)x_i\\ &= \sum_{(i,j)\in P}q_{ij}x_ix_j +\sum_{(i,j)\in N}(-q_{ij})\bar{x}_ix_j+\sum_{(i,j)\in N}q_{ij}x_j+\sum_{i=1}^n(q_{ii}+c_i)x_i,\\ {}&=\sum_{(i,j)\in P}q_{ij}x_ix_j +\sum_{(i,j)\in N}(-q_{ij})\bar{x}_ix_j+\sum_{i=1}^n(q_{ii}+c_i+\rho_i)x_i\\ & = h(\mathbf{x}), \mbox{ say}. \end{aligned} $$
(1.12)

Note that the coefficients of x i x j and \(\bar {x}_ix_j\) in h(x) are positive. This representation h(x) of f(x) is sometimes referred to as Rhys form [35] indicating an early work on a special case of QUBO [72] and is one of the posiform representations of f(x) [7, 8, 35] when viewed as a pseudo-Boolean function . For interesting properties of posiforms of the QUBO objective function, we refer to Chap. 5. Note that maximizing f(x) over x i ∈{0, 1} is equivalent to maximizing h(x) over \(x_i,\bar {x_i}\in \{0,1\}\). We now show that the problem of maximizing h(x) over \(x_i,\bar {x_i}\in \{0,1\}\) is equivalent to solving a maximum weight stable set problem. The proof is based on a construction given in [35].

For each (i, j) ∈ P ∪ N introduce a node v ij with weight |q ij|. Also, for i = 1, 2, …, n, introduce two nodes i and i′ with respective weights w i = q ii + c i + ρ i + M and \(w_{i'}=M\), where M is a large positive number. Now introduce the edges (v ij, i′), (v ij, j′) for each (i, j) ∈ P, the edges (v ij, i), (v ij, j′) for each (i, j) ∈ N. Finally connect each node i with node i′ by an edge (i, i′), for i = 1, 2, …n. Let G′ = (V, E′) be the resulting graph.

Define the product node variables y ij where

$$\displaystyle \begin{aligned} y_{ij} = \begin{cases} 1 &\text{ if vertex}\ v_{ij}\ \text{is selected}\\ 0 &\text{ if vertex}\ v_{ij}\ \text{is not selected.} \end{cases} \end{aligned}$$

Similarly, define the selection variables for nodes i and i′ as

$$\displaystyle \begin{aligned} x_{i} = \begin{cases} 1 &\text{ if vertex}\ i\ \text{is selected}\\ 0 &\text{ if vertex}\ i\ \text{is not selected} \end{cases} \quad \mathrm{and}\quad x^{\prime}_i = \begin{cases} 1 &\text{ if vertex}\ i'\ \text{is selected}\\ 0 &\text{ if vertex}\ i'\ \text{is not selected} \end{cases} \end{aligned}$$

Then, the standard integer programming formulation of the maximum weight stable set problem (MWSSP) on G′ is

Theorem 1.2

If (y, x, x ) is an optimal solution to the SSIP defined above, then x is an optimal solution to the QUBO with objective function f(x).

Proof

Recall that optimizing f(x) over x ∈{0, 1}n is equivalent to optimizing h(x) over \(\mathbf {x},\bar {\mathbf {x}}\in \{0,1\}^n\). Let ϕ(y, x, x ) denote the objective function of SSIP. Since M is large enough, \(x^{\prime }_i=1-x_i\) in every optimal solution to SSIP on G′ for otherwise, a better solution can be obtained. The inequalities (1.13) imply that \(y_{ij}\leq \min \{x_i,x_j\}\) in every optimal solution and hence y ij = 0 if at least one of x i or x j is zero. If x i = x j = 1 then, y ij = 1 in an optimal solution since q ij > 0 for (i, j) ∈ P. Thus y ij = x i x j in an optimal solution of SSIP on G′. Similarly, we can show that \(y_{ij}=x^{\prime }_ix_j\) in an optimal solution. Noting that \(x^{\prime }_i=1-x_i=\bar {x}_i\), we have

$$\displaystyle \begin{aligned}\phi(\mathbf{y},\mathbf{x}, \mathbf{x}')=h(\mathbf{x})+nM \end{aligned} $$
(1.17)

for every optimal solution (y, x, x ) of SSIP on G′.

Conversely, for any x ∈{0, 1}n define y ij = x i x j for all (i, j) ∈ P, y ij = (1 − x i)x j for all (i, j) ∈ N, and \(x^{\prime }_i=\bar {x}_i=1-x_i\). The solution (y, x, x ) constructed above is indeed a feasible solution to SSIP on G′ satisfying

$$\displaystyle \begin{aligned}\phi(\mathbf{y},\mathbf{x}, \mathbf{x}')=h(\mathbf{x})+nM \end{aligned} $$
(1.18)

and the result follows. □

Example 1.1

Consider the QUBO with objective function

$$\displaystyle \begin{aligned}f(\mathbf{x})=7x_1x_2-3x_1x_3-12x_1x_4+4x_2x_3+8x_2x_4+3x_1-10x_2+5x_4.\end{aligned}$$

Then, P = {(1, 2), (2, 3), (2, 4)} and N = {(1, 3), (1, 4)}. The cost matrix Q, the sets N i, i = 1, 2, 3, 4, and the values ρ i and q ii + c i + ρ i, for i = 1, 2, 3, 4 are given below

$$\displaystyle \begin{aligned}\mathbf{Q}=\begin{pmatrix} 0 & 7 &-3 &-12\\ 0 & 0 & 5 & 0\\ 0 & 0 & 0& 0\\ 0 & 0 & 0& 0 \end{pmatrix}\end{aligned}$$

Now construct graph G′ = (V, E) for the stable set problem as discussed above. The resulting graph is given in Fig. 1.3. In the figure, the numbers shown outside the nodes represent the weight of the node.

Fig. 1.3
figure 3

The graph for the maximum weight stable set problem constructed from the QUBO in Example 1.1.

It can be verified that x 1 = 0, x 2 = x 3 = x 4 = 1 is an optimal solution to the QUBO with optimal objective function value 7. An optimal solution to the constructed maximum weight stable set problem is S = {v 13, 1, v 14, 4, v 23, v 24, 2, 3} with value equal to 7 + 4M. Also, the QUBO solution recovered from S is x 1 = 0, x 2 = x 3 = x 4 = 1 and this is optimal with optimal objective function value 7.

Considering the equivalence between the maximum weight stable set problem and the maximum weight clique problem, we can see that QUBO is equivalent to the maximum weight clique problem as well.

1.4.4 QUBO and the Generalized Stable Set Problem

Let G = (V, E) be a graph and c i be the weight associated with vertex i ∈ V . Also, for each edge (i, j) ∈ E a cost q ij is given. Recall that a stable set in G is a subset S of V  such that no two nodes in S are adjacent in G. In the generalized stable set problem [37, 39, 65], we relax the restriction that no two vertices in S are adjacent by imposing a penalty. That is, if two vertices in S are adjacent in G, a penalty q ij is incurred. The generalized stable set problem on a complete graph is precisely a QUBO. If G is not complete, we can define q ij = 0 for (i, j)∉E to yield a QUBO formulation. The generalized stable set problem and some of its variations are studied by many authors [4, 37, 39, 65].

1.4.5 Quadratic Pseudo-Boolean Optimization

A quadratic function in variables \(X=\{x_1,x_2,\ldots ,x_n, \bar {x}_1,\bar {x}_2,\ldots ,\bar {x}_n\}\) where \(x_i,\bar {x}_i\in \{0,1\}\) is called a quadratic pseudo-Boolean function (quadratic PBF) [8]. The objective function of the maximum-cut problem is a quadratic PBF and so is the objective function of QUBO. A quadratic PBF is in posiform if all associated coefficients (except possibly the constant term) are positive. A posiform is homogeneous if the associated constant is zero. A homogeneous posiform ζ can be written as

$$\displaystyle \begin{aligned}\zeta(\mathbf{x},\bar{\mathbf{x}})=\sum_{i<j}\left(a_{ij}x_ix_j+b_{ij}\bar{x}_ix_j+c_{ij}x_i\bar{x}_j+d_{ij}\bar{x}_i\bar{x}_j\right) + \sum_{i=1}^n\left(\alpha_ix_i+\beta_i\bar{x}_i\right)\end{aligned}$$

The function h(x) given in Eq. (1.12) has positive coefficients for quadratic terms but the associated linear terms still have positive and negative coefficients. Note that \(-x_i=\bar {x}_i-1\) and \(-\bar {x_i}=x_i-1\). Using these transformations, we can construct a posiform \(h^1(\mathbf {x},\bar {\mathbf {x}})\) such that \(f(\mathbf {x})=\alpha _1+h^1(\mathbf {x},\bar {\mathbf {x}})\) for all x ∈ {0, 1} and the coefficients of h 1 are positive. Likewise, we can construct a posiform \(h^2(\mathbf {x},\bar {\mathbf {x}})\) such that \(f(\mathbf {x})=\alpha _2-h^2(\mathbf {x},\bar {\mathbf {x}})\) for all x ∈{0, 1}n.

Lemma 1.4 ([35])

\(f(\mathbf {x})=\alpha _1 - \zeta ^1(\mathbf {x},\bar {\mathbf {x}})=\alpha _2 + \zeta ^2(\mathbf {x},\bar {\mathbf {x}})\) , where ζ 1 and ζ 2 are in homogeneous posiform and α 1 and α 2 are constants.

Proof

Without loss of generality, we assume that the matrix Q is lower triangular and the diagonal entries are zero (see Sect. 1.3). Thus

$$\displaystyle \begin{aligned}f(\mathbf{x}) = \sum_{i<j}q_{ij}x_ix_j +\sum_{i=1}^nc_ix_i.\end{aligned}$$

Note that when x i and x j are binary variables,

$$\displaystyle \begin{aligned}x_ix_j=\frac{1}{2}\left(x_ix_j+\bar{x}_i\bar{x}_j + x_i+x_j-1\right)\end{aligned}$$

and

$$\displaystyle \begin{aligned}-x_ix_j=\frac{1}{2}\left(x_i\bar{x}_j+\bar{x}_i x_j - x_i-x_j\right)\end{aligned} $$

Substitute these values in f(x) and simplify. Then, in the linear terms, if coefficient of x i or \(\bar {x}_i\) is negative, make a substitution using the equalities \(-x_i=\bar {x}_i-1\) and \(-\bar {x_i}=x_i-1\) and we get the first equality in the lemma. An analogous proof can be given for the second part also. □

The homogeneous posiforms ζ 1 and ζ 2 are not unique. The function h(x) constructed in Eq. (1.12) have positive coefficients for all quadratic terms and can be converted into posiform by applying the transformation for linear terms used in the proof above. There are many other ways to obtain the required posiforms in the lemma above.

Since \(\zeta ^1(\mathbf {x},\bar {\mathbf {x}})\) and \(\zeta ^2(\mathbf {x},\bar {\mathbf {x}})\) are positive for all x ∈{0, 1}n, we have

$$\displaystyle \begin{aligned}\alpha_2 \leq f(\mathbf{x}) \leq \alpha_1 \mbox{ for all } \mathbf{x} \in \{0,1\}^n.\end{aligned} $$

Let α be the smallest value real number for which there exist a homogeneous posiform \(\zeta (\mathbf {x},\bar {\mathbf {x}})\) such that \(f(\mathbf {x})=\alpha - \zeta (\mathbf {x},\bar {\mathbf {x}})\). Then, f(x) ≤ α for all x ∈{0, 1}n and \(\zeta (\mathbf {x},\bar {\mathbf {x}})\) is called the complement of f(x). Thus, by computing the complement of f(x) we get an immediate upper bound on f(x). For interesting discussions on computing the best upper bound of this type and other related results, we refer to [35].

1.5 Roof Duality

Consider the objective function of QUBO in Rhys form h(x). i.e.,

$$\displaystyle \begin{aligned}\sum_{(i,j)\in P}q_{ij}x_ix_j +\sum_{(i,j)\in N}(-q_{ij})\bar{x}_ix_j+\sum_{i=1}^n l_ix_i.\end{aligned}$$

Without loss of generality assume that q ij = 0 if i > j. Recall that P = {(i, j) : q ij x i x j ∈ h(x), i ≠ j, q ij ≠ 0} and \(N=\{(i,j) : q_{ij}\bar {x}_ix_j \in h(\mathbf {x}), i\neq j, q_{ij}\neq 0\}\). The set inclusion notation q ij x i x j ∈ h(x) used here simply indicates that q ij x i x j is a term in h(x).

For q ij ≥ 0 and x i, x j ∈{0, 1}, q ij x i x j ≤ λ ij x i + μ ij x j and \(q_{ij}\bar {x}_ix_j \leq \lambda _{ij}(1-x_i)+\mu _{ij}x_j\) for all λ ij + μ ij = q ij, λ ij ≥ 0, μ ij ≥ 0. A roof of h(x) is a linear 0-1 function of the form [35]

$$\displaystyle \begin{aligned} r(\mathbf{x},\boldsymbol{\lambda},\boldsymbol{\mu})=\sum_{i=1}^nl_ix_i + \sum_{(i,j)\in P}\left(\lambda_{ij}x_i+\mu_{ij}x_j\right ) + \sum_{(i,j)\in N}\left(\lambda_{ij}(1-x_i)+\mu_{ij}x_j\right ) \end{aligned} $$
(1.19)

where λ ij + μ ij = q ij and λ ij ≥ 0, μ ij ≥ 0 for all (i, j) ∈ P ∪ N. By construction

$$\displaystyle \begin{aligned}h(\mathbf{x}) \leq r(\mathbf{x},\boldsymbol{\lambda},\boldsymbol{\mu}) \mbox{ for all }\mathbf{x} \in \{0,1\}^n.\end{aligned}$$

Let G = (V, E) be a graph with V = {1, 2, …, n} and E = P ∪ N. For each i ∈ V , define the set of outgoing and incoming arcs at node i by

$$\displaystyle \begin{aligned}O(i) = \{j\in V : (i,j)\in E\} \mbox{ and } I(i)=\{k\in V : (k,i)\in E\}.\end{aligned} $$

Then the coefficient, p i(λ, μ), of x i in r(x, λ, μ) is given by

$$\displaystyle \begin{aligned}p_i(\boldsymbol{\lambda},\boldsymbol{\mu})=l_i+\sum_{k\in I(i)}\mu_{ki}+\sum_{j\in O(i)}\delta_{ij}\lambda_{ij}, \end{aligned}$$

where

$$\displaystyle \begin{aligned} \delta_{ij}= \begin{cases} 1 &\text{ if }\ (i,j)\in P\\ -1 &\text{if }\ (i,j)\in N \end{cases} \end{aligned} $$

Thus, r(x, λ, μ) can be written as

$$\displaystyle \begin{aligned} r(\mathbf{x},\boldsymbol{\lambda},\boldsymbol{\mu}) = \sum_{i=1}^np_i(\boldsymbol{\lambda},\boldsymbol{\mu})x_i +\sum_{(i,j)\in N}\lambda_{ij} \end{aligned} $$

where λ ij + μ ij = q ij, λ ij ≥ 0, μ ij ≥ 0. Then the roof dual of QUBO [7, 35], when given in terms of h(x), is

$$\displaystyle \begin{aligned} \text{Minimize }&\max\{r(\mathbf{x},\boldsymbol{\lambda},\boldsymbol{\mu}) : \mathbf{x}\in \{0,1\}^n\} \\ \mbox{Subject to: }~~& \lambda_{ij}+\mu_{ij}=q_{ij} \mbox{ for all } (i,j)\in P\cup N\\ &\lambda_{ij}\geq 0, \mu_{ij}\geq 0 \mbox{ for all } (i,j)\in P\cup N. \end{aligned} $$

Since r(x, λ, μ) is linear in x, its maximum is attained when x i = 1 for all p i(λ, μ) ≥ 0 and x i = 0 otherwise. i.e.,

$$\displaystyle \begin{aligned}\max\{r(\mathbf{x},\boldsymbol{\lambda},\boldsymbol{\mu}) : \mathbf{x}\in \{0,1\}^n\} = \sum_{i=1}^n\max\{0,p_i(\boldsymbol{\lambda},\boldsymbol{\mu})\} +\sum_{(i,j)\in N}\lambda_{ij}.\end{aligned}$$

Thus the roof dual [6, 35] can be written as

$$\displaystyle \begin{aligned} \text{RD: Minimize}&\sum_{i=1}^nu_i +\sum_{(i,j)\in N}\lambda_{ij} \\ \mbox{Subject to: }~~& u_i\geq p_i(\boldsymbol{\lambda},\boldsymbol{\mu})\\ &\lambda_{ij}+\mu_{ij}=q_{ij} \mbox{ for all } (i,j)\in P\cup N\\ &\lambda_{ij}\geq 0, \mu_{ij}\geq 0 \mbox{ for all } (i,j)\in P\cup N\\ &u_i \geq 0 \mbox{ for }i=1,2,\ldots ,n. \end{aligned} $$

Note that RD is a linear program and it is closely linked to network flows on the support graph G. The optimal objective function value of RD provides an upper bound for the optimal objective function value of QUBO. The dual of RD is precisely the continuous relaxation of the linearization of the Rhys form of QUBO. Thus, the upper bound obtained by the roof dual is precisely the upper bound obtained by solving the continuous relaxation of the linearization of Rhys form of QUBO. Let us take an example to illustrate the concept of roof dual.

Example 1.2

Let f(x) = 10x 1 x 2 −5x 1 x 3 +20x 1 x 4 −12x 2 x 4 −2x 3 x 4 −6x 1 −5x 2 +8x 3 +5x 4. Then P = {(1, 2), (1, 4)} and N = {(1, 3), (2, 4), (3, 4)}. Using the transformation \(x_i=1-\bar {x}_i\) for (i, j) ∈ N we obtain the Rhys form h(x) given by

$$\displaystyle \begin{aligned}h(\mathbf{x})=10x_1x_2+5\bar{x}_1x_3+20x_1x_4+12\bar{x}_2x_4+2\bar{x}_3x_4-6x_1-5x_2+3x_3-9x_4.\end{aligned} $$

The support graph associated with h(x) is given below (Fig. 1.4).

Fig. 1.4
figure 4

The support graph of h(x)

Note that, O(1) = {2, 3, 4}, O(2) = {4}, O(3) = {4}, O(4) = ∅ and I(1) = ∅, I(2) = {1}, I(3) = {1}, I(4) = {1, 2, 3}. Therefore,

$$\displaystyle \begin{aligned} p_1(\boldsymbol{\lambda},\boldsymbol{\mu})&= -6+\lambda_{12}-\lambda_{13}+\lambda_{14}\\ P_2(\boldsymbol{\lambda},\boldsymbol{\mu})&=-5-\lambda_{24}+\mu_{12}\\ P_3(\boldsymbol{\lambda},\boldsymbol{\mu})&=3-\lambda_{34}+\mu_{13}\\ P_4(\boldsymbol{\lambda},\boldsymbol{\mu})&=-9+\mu_{14}+\mu_{24}+\mu_{34}. \end{aligned} $$

Then, the roof dual RD is given by,

$$\displaystyle \begin{aligned} \text{RD: Minimize}&\sum_{i=1}^nu_i +\lambda_{13} + \lambda_{24} +\lambda_{34}\\ \mbox{Subject to: }~~& u_1-\lambda_{12}+\lambda_{13}-\lambda_{14}\geq -6\\ & u_2+\lambda_{24}-\mu_{12}\geq -5\\ & u_3+\lambda_{34}-\mu_{13}\geq 3\\ &u_4 -\mu_{14}-\mu_{24}-\mu_{34}\geq -9\\ &\lambda_{ij}+\mu_{ij}=q_{ij} \mbox{ for all } (i,j)\in P\cup N\\ &\lambda_{ij}\geq 0, \mu_{ij}\geq 0 \mbox{ for all } (i,j)\in P\cup N\\ &u_i \geq 0 \mbox{ for }i=1,2,\ldots ,4. \end{aligned} $$

The dual of this linear program is

$$\displaystyle \begin{aligned} \text{Maximize}& -6x_1-5x_2+3x_3-9x_4+10y_{12}+5y_{13}+20y_{14}+12y_{24}+2y_{34}\\ \mbox{Subject to: }~~& y_{12}\leq x_1, y_{12}\leq x_2\\ & y_{13}\leq 1-x_1, y_{13}\leq x_3\\ &y_{14}\leq x_1, y_{14}\leq x_4\\ &y_{24}\leq 1-x_2, y_{24}\leq x_4\\ &y_{34}\leq 1-x_3, y_{34}\leq x_3\\ &0\leq x_i\leq 1, i=1,2,3,4\\ & y_{ij}\mbox{ unrestricted for all }(i,j)\in P\cup N. \end{aligned} $$

and this is the continuous relaxation of the linearization of h(x). (The notion of linearization is discussed in Sect. 1.7 and Chap. 4 and 6.)

1.6 Model Building Using QUBO

Let us now look at some motivating applications of the QUBO model. For a detailed discussion on the applications of QUBO and its power of providing a unifying framework to represent a large class of combinatorial optimization problems we refer to Chap. 2.

1.6.1 Person Detection and Tracking in a Crowded Environment

Identifying and tracking people is an important problem within a variety of application scenarios. These include, examination and exploration of group behaviour, video surveillance, pedestrian detection systems, disaster management, among others. The problem however is very complex, particularly due to the high level of occlusions, and researchers in computer vision developed various techniques to solve this problem making use of statistical machine learning, mathematical modelling, and optimization [16, 73, 74]. Let us now discuss one such application modelled as a QUBO by Rodriguez et al. [73].

Let N = {1, 2, …, n} be a set of points identified in an image as possible person detection locations and let c i be an associated confidence score for location i, for i ∈ N. The point i and the score c i can be estimated in different ways, as by appropriately trained preprocessing algorithms. We are also given person density information d i (i.e. the number of people per pixel) estimated in a window of size σ at location i, for i ∈ N. We want to find locations of people in the image such that the sum of detector confidence score is maximized while making use of the density information to minimize selection of locations with significantly overlapping neighborhoods, which in turn minimizes potential counting errors accumulated due to multiple detection of the same person. Figure 1.5a gives a sample shot of a crowd and (b) represents the density contours. Part (c) of the figure gives potential locations and (d) gives the filtered solution produced by QUBO. Note that some of the red rectangles, that are potential locations, are either removed as irrelevant or confirmed in the QUBO solution.Footnote 1

Fig. 1.5
figure 5

Person detection sample and crowd density contours

Let x = (x 1, x 2, …, x n) ∈{0, 1}n, where x i = 1 implies a detection at location i is confirmed and 0, otherwise. Then cx measures the total confidence score of ‘confirmed’ locations. To make sure that only valid configurations of non-overlapping locations are selected, we construct a penalty matrix W, where w ij = − if detections at locations i and j have significant area overlap ratio, and 0 otherwise. Then, maximizing provides a meaningful model to represent the person detection problem. This, in fact, is a variation of a model proposed in [16].

To improve the accuracy of the model, Rodriguez et al. [73] introduced another quadratic term penalizing the difference between the density values estimated in two ways: (i) the vector d obtained using a regression-based density estimator and (ii) the vector Ax counting the density of active detections for an appropriately defined A matrix. This leads to the penalty term \(||\mathbf {d}-\mathbf {Ax}||{ }_1^2\). Then, the objective function to be maximized is

where x ∈{0, 1}n and α is a parameter. This problem can be written as QUBO by simplifying \(||\mathbf {d}-\mathbf {Ax}||{ }_1^2\) and combining it with .

1.6.2 Module Flipping in Circuit Layout Design

Let us now look at another example of QUBO that arises in the layout design of VLSI (very large scale system integration). Our discussion here is based on the works of Boros and Hammer [7] and Cheng et al. [12]. In the layout design of circuits, rectangular modules containing one or more pins are embedded in a base board. Each pin on a module has a compatible pin on another module and these compatible pairs need to be connected using horizontal and/or vertical wiring. The reliability has an inverse relationship with wire length and hence we are interested in minimizing the total wire length. Each module can be placed in four different ways on its designated space on the base board by flipping horizontally, vertically, or both. In Fig. 1.6a, we give a layout of modules and in Fig. 1.6b we give a layout after vertically flipping module 1, horizontally flipping module 2, and a vertical flip followed by a horizontal flip on module 5. The compatible pairs of pins are labelled using the same alphabets. For example, a pin with label a needs to be connected to another with the same label a and so on.

Fig. 1.6
figure 6

Module flipping example. Applied a vertical flip on module 1, horizontal flip on module 2, vertical flip followed by horizontal flip on module 5. (a) Original position. (b) After flip operations are applied on modules 1, 2, and 5

The module flipping problem is to find the orientations of the placement of each module on the base board, identified by one or more flipping operations, so that the total wire length is minimized. This problem was originally proposed by Cheng et al. [12] and they presented a maximum cut formulation, which as we know, is equivalent to QUBO. Boros and Hammer [7] presented a direct QUBO formulation of the problem and we discuss this model here.

Let N = {1, 2, …, n} be the collection of all modules placed on the base board with an initial layout. Consider the binary decision variable x i which takes value 1 if module i is flipped horizontally, for i = 1, 2, …, n. Likewise, consider the binary decision variable y i which takes value 1 if module i is flipped vertically, for i = 1, 2, …, n. For each pair (i, j) of modules and (r, s) ∈{0, 1}×{0, 1}, let \(H^{(r,s)}_{ij}\) denotes the total length of horizontal wire segments which connects the pins of modules i and j under the flipping sequence (r, s). For example, if (r, s) = (0, 0) no horizontal flipping occurs for modules i and j, if (r, s) = (0, 1), only module j is flipped horizontally, and so on. Similarly, let \(V^{(r,s)}_{ij}\) denotes the total length of vertical wire segments which connects the pins of modules i and j under the flipping sequence (r, s). Now, the total wire-length, as a function of flipping operation of the modules, can be expressed as

$$\displaystyle \begin{aligned} \phi(\mathbf{x},\mathbf{y}) = &\sum_{i=1}^{n-1}\sum_{j=i+1}^n\left(H^{(0,0)}_{ij}\bar{x}_i\bar{x}_j+H^{(0,1)}_{ij}\bar{x}_ix_j+H^{(1,0)}_{ij}{\mathbf{x}}_i\bar{x}_j+H^{(1,1)}_{ij}x_ix_j\right)\\ &+ \sum_{i=1}^{n-1}\sum_{j=i+1}^n\left(H^{(0,0)}_{ij}\bar{y}_i\bar{y}_j+H^{(0,1)}_{ij}\bar{y}_iy_j+H^{(1,0)}_{ij}y_i\bar{y}_j+H^{(1,1)}_{ij}y_iy_j\right) \end{aligned} $$
(1.20)

where \(\bar {x}_i=1-x_i\). Then, the optimal flipping of the modules corresponds to assigning 0 − 1 values to the variables x i and y i for i ∈ N such that ϕ(x, y) is minimized. It can be verified that minimization of ϕ(x, y) decomposes into two problems, one with the x-variables and the other with the y-variables and each such problem is a QUBO.

Related applications of QUBO in equivalent forms such as Maximum Cut, weighted stable set etc., can be found in [3, 9, 15, 46, 54].

1.6.3 Side-Chain Positioning in Protein Design

The side chain positioning problem arises as a subproblem in protein structure prediction which has a natural QUBO formulation. Our discussion here is based on the works [11, 30, 31, 47]. While the formulation itself is straightforward, to make the discussion clearer, let us very briefly review some related concepts, terminology, and background information and for this purpose, we follow the articles [11, 31].

A protein molecule is composed of a chain of amino acids and each amino acid consists of a centralized single carbon atom along with a hydrogen atom, an amino group NH2, a carboxyl group COOH, and a side chain which characterizes the amino acid. Carbon, hydrogen, the amino group, and the carboxyl group are called the main atoms and the protein backbone is formed by a repeating sequence of the main atoms (main chain) and a side chain is attached to the backbone for each element of this sequence (Fig. 1.7).

Fig. 1.7
figure 7

A chain of three amino acids. The labels S 1, S 2 and S 3 indicate the corresponding side-chains

The chemical composition of a protein molecule is specified by the sequence of the associated amino acids. Every amino acid main chain has the freedom to rotate at specified points. A side chain of amino acids (with the exception of glycine) can also rotate at different points and the three dimensional structure of a protein is identified by the location of its backbone (main chain) atoms and the combined rotations of the main chain and the side chains. For an appropriate notion of energy, a protein structure folds into a minimal energy state to reach chemical stability. Quoting from [31], “Chemical stability of a protein comes from four sources: the internal stability of the three-dimensional structure of its backbone; the internal stability of each rotamer; the interaction of each rotamer with the main chain of the amino acid it is attached to; and the chemical interactions of the rotamers that are positioned close to each other, either on the protein sequence, or in three-dimensional space.” In protein synthesis, we want to maximize the chemical stability using appropriate quantitative stability measures.

The backbone structure is assumed to be given and our aim is to select one rotamer (side-chain molecule) to be associated with each amino acid sequence in the backbone. Let e 0 be the stability measure for the backbone structure, which is constant. Suppose that the backbone consists of m amino acid sequences and for amino acid k, we select a rotamer from the candidate list V k, k = 1, 2, …, m. For rotamer i ∈ V k let \(e^k_i\) be the contribution of i to the stability measure which consists of the sum of the interaction stability measure of candidate rotamer i with its associated amino acid k and the internal stability measure of rotamer i. For each i ∈ V k and j ∈ V t, let \(e^{kt}_{ir}\) be the stability measure due to the interaction between rotamer i ∈ V k and j ∈ V t. This value depends on the nature of the rotamers involved, the nature of the amino acids they attach to, and the proximity level of i and j. Since the backbone is already known, the proximity level is also known. The values e 0, \(e^k_i\), and \(e^{kt}_{ir}\) are estimated either by experiments or by theoretical considerations. For our modeling purpose, it is not highly relevant the way these values are obtained or their magnitudes. The side-chain positioning problem is to select an i k from V k for each k = 1, 2, …m such that

$$\displaystyle \begin{aligned}e_0+\sum_{k=1}^m e^k_{i_k}+\frac{1}{2}\sum_{k=1}^m\sum_{t=1}^m e^{kt}_{i_ki_t}\end{aligned}$$

is maximized. Let us now formulate this problem as QUBO.

Let \(V=\cup _{k=1}^mV_k\). Without loss of generality, assume V = {1, 2, …, n}. Construct the complete m-partite undirected graph G = (V, E) where (i, j) ∈ E if i ∈ V k, j ∈ V t, t ≠ k. Note that we need to select precisely one element for each V k for k = 1, 2…, m. For each i ∈ V  consider the binary decision variable x i where x i = 1 if rotamer i is selected and, 0 otherwise. Define the n × n cost matrix Q with its (i, j)th element q ij as

$$\displaystyle \begin{aligned} q_{ij} = \begin{cases} 0 &\text{ if}\ i=j\\ -M &\text{ if }\ i\neq j \text{and}\ i,j\in V_k\ \text{for some}\ k,\\ \frac{1}{2}e^{kt}_{rs} &\text{ if }\ (i,j)\in E, \end{cases} \end{aligned}$$

where M is a large positive number and the n vector where \(c_{i}=e^k_j\) if i ∈ V k and i = j. Now an optimal solution to the QUBO instance \((\mathbf {Q},\mathbf {c},\max )\) gives an optimal solution to the side chain positioning problem. Note that, no two elements from the same set V k will be selected in an optimal solution for any k because of the penalty value and since the objective function is of maximization type, c i ≥ 0 for all i and \(e^{kt}_{rs}\geq 0\) for all k, t, r, s at least one elements from each V k will be selected, establishing the validity of the model.

1.6.4 QUBO and Machine Scheduling

We now discuss a general framework of a special case of QUBO, which can be used to model several important machine scheduling problems. A matrix Q is called a half product matrix if it is upper triangular with diagonal elements zero and there exist vectors a = (a 1, a 2, …, a n) and b = (b 1, b 2, …, b n) such that q ij = a i b j for i = 1, 2, …, n and j > i. A half-product QUBO is a special case of QUBO where Q is a half-product matrix. Thus, a QUBO with a half-product matrix can be written as

$$\displaystyle \begin{aligned} \begin{array}{ll} \mbox{Maximize }&\sum\limits_{1\leq i < j\leq n}a_ib_jx_ix_j +\sum\limits_{j=1}^nc_jx_j\\ \mbox{Subject to}&\\ &\mathbf{x}\in \{0,1\}^n \end{array} \end{aligned}$$

The half-product QUBO was introduced by Badics and Boros [2] and independently by Kubiak [52]. Despite the apparently simple special structure, the half-product QUBO is still NP-hard since the subset sum problem is a special case of it [2].

Various single machine scheduling problems can be formulated as a half-product QUBO. This includes, the single-machine variance minimization problem [2, 52], Single machine scheduling to minimize total weighted earliness and tardiness[43] and scheduling with controllable processing times [41]. The Ising version of the half-product QUBO was considered by Kubiak [52], Amit [1], and Mattis [58]. Scheduling jobs on two machines to minimize makespan [43] as well as scheduling on two machines to minimize total weighted completion time [43] can also be formulated as a half-product QUBO . Let us discuss one such formulations using the example of single machine scheduling with controllable processing times (SCPT) [41].

There are n jobs to be processed on a single machine where the processing time of jobs are variables. For job j, let the processing time p j ∈ [0, α j], j = 1, 2, …, n. To schedule the jobs, one needs to identify the optimal values of the processing times p j and then order the jobs so that the sum of the total weighted completion time \(\sum _{j=1}^nw_jC_j\) and the total weighted processing time compression \(\sum _{j=1}^nv_j(u_j-p_j)\) is minimized. Here, C j denote the completion time of job j. This scheduling problem, denoted by SCPT, was first studied by Vickson [78, 79] and the half-product QUBO formulation discussed here is from [41]. Vickson [78] showed that there exists an optimal processing time vector p = (p 1, p 2, …, p n) such that p j = 0 or α j. Combining this observation with the well-known weighted shortest processing time rule (SWPT) [76], we can see that there exists an optimal solution to SCPT such that jobs processing times p j = α j are sequenced in the non-decreasing order of the ratios \(\frac {\alpha _j}{w_j}\) [41]. With this observation, define the decision variables

$$\displaystyle \begin{aligned} x_j = \begin{cases} 0 &\text{ if}\ p_j=0\\ 1 &\text{ if}\ p_j=\alpha_j \end{cases} \end{aligned}$$

and the problem SCPT is well defined when the vectors w = (w 1, w 2, …, w n), α = (α 1, α 2, …, α n, and v = (v 1, v 2, …, v n) are given. Then, the objective function of SCPT is to minimize f(x), where

$$\displaystyle \begin{aligned} f(\mathbf{x}) &= \sum_{j=1}^nw_jx_j\sum_{i=1}^j\alpha_ix_i + \sum_{j=1}^nv_j\alpha_j(1-x_j)\\ &=\sum_{1\leq i < j\leq n}\alpha_iw_jx_ix_j - \sum_{j=1}^n\alpha_j(v_j-w_j)x_j+\sum_{j=1}^nv_j\alpha_j \end{aligned} $$

Note that \(\sum _{j=1}^nv_j\alpha _j\) is a constant and can be discarded. Choosing, a i = α i, b i = −w i and c i = α i(v i − w i), we can see that minimization of f(x) can be accomplished by solving the half-product QUBO.

1.7 Mixed Integer Programming Formulations

Various mixed integer linear programming (MILP) formulations of QUBO are discussed in detail in other chapters of this book. Most of the basic formulations are based on writing the product of two variables (not necessarily both are of binary type) as a new variable and force the definition of the underlying product by imposing additional constraints, that may or may not exploit the sign of the q ij values. Early research work on these types of reductions goes back to 1960s [13, 20, 29, 80, 81] where product of two binary variables are ‘linearized’. Later, Glover [25, 26] and Glover and Woolsey [27] considered linearization of the product of a binary variable and a continuous variable. Most of these linearization techniques can be presented under a general approximation framework introduced by McCormick [59, 60].

Consider the bounded variables

$$\displaystyle \begin{aligned}l_1 \leq x_1 \leq u_1 \mbox{ and } l_2 \leq x_2 \leq u_2.\end{aligned}$$

Then,

$$\displaystyle \begin{aligned} (x_1-l_1)(u_2-x_2)\geq 0,\quad & (u_1-x_1)(x_2-l_2)\geq 0\\ (x_1-l_1)(x_2-l_2)\geq 0,\quad &\mbox{ and } (u_1-x_1)(u_2-x_2)\geq 0. \end{aligned} $$

Expanding and rearranging the terms, we have

$$\displaystyle \begin{aligned} x_1x_2 &\leq l_1x_2+u_2x_1-l_1u_2\\ x_1x_2 &\leq l_2x_1+u_1x_2-l_2u_1\\ x_1x_2 &\geq l_1x_2+l_2x_1-l_1l_2\\ x_1x_2 &\geq u_2x_1+u_2x_2-u_1u_2 \end{aligned} $$

Now, let us replace the term x 1 x 2 by a new variable w in the above inequalities and we get

$$\displaystyle \begin{aligned}w &\leq l_1x_2+u_2x_1-l_1u_2 \end{aligned} $$
(1.21)
$$\displaystyle \begin{aligned}w &\leq l_2x_1+u_1x_2-l_2u_1 \end{aligned} $$
(1.22)
$$\displaystyle \begin{aligned}w &\geq l_1x_2+l_2x_1-l_1l_2 \end{aligned} $$
(1.23)
$$\displaystyle \begin{aligned}w &\geq u_2x_1+u_2x_2-u_1u_2 \end{aligned} $$
(1.24)

The inequalities (1.21) and (1.22) are called lower McCormick envelops and (1.23), and (1.24) are called upper McCormick envelops [59, 60]. The variable w along with the McCormick envelop inequalities provide an approximation to the product x 1 x 2. This approximation becomes exact when one of the variables x 1 or x 2 is binary and it becomes the linearizations proposed by Glover [25, 26] and Glover and Woolsey [27]. To see the validity of the linearization, without loss of generality, assume x 2 is binary. Then l 2 = 0 and u 2 = 1. When x 2 = 0, the inequalities (1.22) and (1.23) guarantees that w = 0. Likewise, when x 2 = 1, inequalities (1.21) and (1.24) guarantees that w = x 1. The special case of McCormick envelop inequalities involving binary variables leads to a natural MILP formulation of QUBO which is the well-known Glover’s linearization Glover [25, 26]. When both variables involved in a product are binary, McCormick envelop inequalities reduces to Glover-Woolsey linearization [27]. We also want to highlight that the linearization of product of two binary variables are reported in literature either explicitly or implicitly, even prior to [27]. For example, Goldman [29] pointed out the works of Fortet [20, 21] and Dantzig [13] in this context.

Replace the product x i x j with a new variable w ij in the objective function of QUBO and force the equality x i x j = w ij by introducing the McCormick envelop inequalities. The resulting MILP formulation, known as Glover-Woolsey linearization or the standard linearization, is given below.

(1.25)
$$\displaystyle \begin{aligned}& w_{ij} -x_j \leq 0 \mbox{ for } i,j\in \{1,2,\ldots ,n\} \end{aligned} $$
(1.26)
$$\displaystyle \begin{aligned}& x_i+x_j-w_{ij} \leq 1 \mbox{ for } i,j\in \{1,2,\ldots ,n\} \end{aligned} $$
(1.27)
$$\displaystyle \begin{aligned}&w_{ij} \geq 0 \mbox{ for } i,j\in \{1,2,\ldots ,n\} \end{aligned} $$
(1.28)
$$\displaystyle \begin{aligned}&x_i\in \{0,1\} \mbox{ for } i=1,2\ldots ,n. \end{aligned} $$
(1.29)

Recall the definition of the sets P and N introduced in Sect. 1.4.3. For (i, j) ∈ P constraints (1.27) and (1.28) can be removed and for (i, j) ∈ N constraints (1.25) and (1.26) can be removed. Thus GW becomes the reduced Glover-Woolsey linearization

(1.30)
$$\displaystyle \begin{aligned}& w_{ij} - x_j\leq 0 \mbox{ for } (i,j)\in P \end{aligned} $$
(1.31)
$$\displaystyle \begin{aligned}& x_i+x_j-w_{ij} \leq 1 \mbox{ for } (i,j)\in N \end{aligned} $$
(1.32)
$$\displaystyle \begin{aligned}&w_{ij} \geq 0 \mbox{ for } (i,j)\in N \end{aligned} $$
(1.33)
$$\displaystyle \begin{aligned}&x_i\in \{0,1\} \mbox{ for } i=1,2,\ldots ,n. \end{aligned} $$
(1.34)

The convex hull of solutions of GW is called the Boolean quadric polytope[64] which is discussed in detail in Chap. 4. Various other MILP formulations of QUBO are known in literature and most of them uses Glover-Woolsey linearization or Glover’s linearization in one form or another. For a through discussion of this, we refer to Chap. 6 and the recent papers [67, 68].

The continuous relaxation of GW (and RGW) is denoted by CGW (CRGW) and it is obtained by replacing constraint (1.29) (constraint (1.34)) by x i ∈ [0, 1] for i = 1, 2…, n. CGW and CRGW are linear programs and hence can be solved using general purpose linear programming solvers. However, their special structure also brings us additional structural properties and efficient algorithms. Some of these connections and linkages with roof duality is discussed in Chap. 5.

The polytope defined by the constraints of CGW is called the fractional Boolean quadric polytope. Let us first present a useful necessary condition for a solution (x, w) to be an extreme point of the fractional Boolean quadric polytope.

Lemma 1.5

If (x, w) is an extreme point of the fractional Boolean quadric polytope then \(w_{ij}\in \{0, \min \{x_i,x_j\}, x_i+x_j-1\}\) for all i, j.

Proof

Suppose (x, w) is an extreme point of the fractional Boolean quadric polytope. Then, choose a cost matrix Q and a cost vector c such that (x, w) is an optimal solution to CGW. Without loss of generality assume that q ij ≠ 0 for i ≠ j and c i ≠ 0 for all i. Clearly, by feasibility \(0\leq w_{ij} \leq \min \{x_i,x_j\}\) for every i, j, i ≠ j. If possible let \(0 < w_{hk} < \min \{x_h,x_k\}\) for some h, k, h ≠ k and x h + x k − 1 ≠ w hk. Define the solution \((\mathbf {x},\hat {\mathbf {w}})\) where

$$\displaystyle \begin{aligned} \hat{w}_{ij} = \begin{cases} w_{ij}+\epsilon &\text{ if } (i,j)=(h,k)\\ w_{ij} &\text{ Otherwise,} \end{cases} \end{aligned}$$

where 𝜖 ≠ 0. If q hk > 0 then \((\mathbf {x},\hat {\mathbf {w}})\) is a feasible solution to CGW for sufficiently small 𝜖 > 0 which is better that (x, w), contradicting the optimality of (x, w). Similarly, if q hk < 0 then \((\mathbf {x},\hat {\mathbf {w}})\) is a feasible solution to CGW for sufficiently large 𝜖 < 0 which is better than (x, w), contradicting the optimality of (x, w). This completes the proof. □

We now prove the half integrality property of extreme points of the fractional Boolean quadric polytope [64].

Theorem 1.3

If (x , w) is an extreme point of the fractional Boolean quadric polytope the \(x_i\in \{0,1,\frac {1}{2}\}\) for all i and \(w_{ij}\in \{0,1,\frac {1}{2}\}\) for all (i, j) ∈ P  N.

Proof

Let \(\mathfrak {F}\) denote the fractional Boolean quadric polytope and (x, w) be an extreme point. Let

$$\displaystyle \begin{aligned}E^1=\{i : 0 < x_i < \frac{1}{2}\} \mbox{ and } E^2=\{i : \frac{1}{2} < x_i < 1\}. \end{aligned}$$

If E 1 ∪ E 2 = ∅ then from Lemma 1.5, (x, w) is of the required type. So assume that E 1 ∪ E 2 ≠ ∅. Now construct the solutions (x 1, w 1) and (x 2, w 2) as follows: For 𝜖 > 0 let

$$\displaystyle \begin{aligned} x^1_i = \begin{cases} x_i+\epsilon &\text{ if } i\in E^1\\ x_i-\epsilon &\text{ if } i\in E^2\\ x_i &\text{ Otherwise} \end{cases} \quad \mathrm{and}\quad x^2_i = \begin{cases} x_i-\epsilon &\text{ if } i\in E^1\\ x_i+\epsilon &\text{ if } i\in E^2\\ x_i &\text{ Otherwise.} \end{cases} \end{aligned}$$

From Lemma 1.5, \(w_{ij}\in \{0, \min \{x_i,x_j\}, x_i+x_j-1\}\). Define

$$\displaystyle \begin{aligned} w^1_{ij} = \begin{cases} \min\{x^1_i,x^1_j\} &\text{ if } w_{ij}=\min\{x_i,x_j\}\\ x^1_i+x^1_j-1 &\text{ if } w_{ij}=x_i+x_j-1\\ 0 &\text{ Otherwise} \end{cases} \end{aligned}$$
$$\displaystyle \begin{aligned} w^2_{ij} = \begin{cases} \min\{x^2_i,x^2_j\} &\text{ if } w_{ij}=\min\{x_i,x_j\}\\ x^2_i+x^2_j-1 &\text{ if } w_{ij}=x_i+x_j-1\\ 0 &\text{ Otherwise} \end{cases} \end{aligned}$$

For sufficiently small 𝜖 > 0, (x 1, w 1) and (x 2, w 2) are feasible solutions. It can be verified that \((\mathbf {x,w})=\frac {1}{2}\{({\mathbf {x}}^{\mathbf {1}},{\mathbf {w}}^{\mathbf {1}})+({\mathbf {x}}^{\mathbf {1}},{\mathbf {w}}^{\mathbf {1}})\}\), contradicting the fact that (x, w) is an extreme point and the result follows. □

In Sect. 1.4 we observed that the maximum cut problem and QUBO are equivalent. In this sense, an integer programming formulation of the maximum cut problem provides an indirect integer programming representation of QUBO. Consider a complete graph K n = (V, E) and let q ij be the weight of edge (i, j) ∈ E. Then the maximum cut problem in K n can be written as an MILP

The convex hull of feasible solutions of MCUT is called the cut polytope. The relationship between the Boolean quadric polytope and the cut polytope is discussed in Chap. 4.

1.8 Conclusion

The primary goal of this chapter was to give a brief introduction to QUBO along with basic definitions and motivating examples. Each of the chapters that follows is more focussed and is on a specific aspect of QUBO providing a comprehensive and in-depth analysis of the topic.