Theoretical and Experimental Results for Planning with Learned Binarized Neural Network Transition Models

Say, Buser; Devriendt, Jo; Nordström, Jakob; Stuckey, Peter J.

doi:10.1007/978-3-030-58475-7_53

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12333))

Included in the following conference series:

International Conference on Principles and Practice of Constraint Programming

1607 Accesses

Abstract

We study planning problems where the transition function is described by a learned binarized neural network (BNN). Theoretically, we show that feasible planning with a learned BNN model is NP-complete, and present two new constraint programming models of this task as a mathematical optimization problem. Experimentally, we run solvers for constraint programming, weighted partial maximum satisfiability, 0–1 integer programming, and pseudo-Boolean optimization, and observe that the pseudo-Boolean solver outperforms previous approaches by one to two orders of magnitude. We also investigate symmetry handling for planning problems with learned BNNs over long horizons. While the results here are less clear-cut, we see that exploiting symmetries can sometimes reduce the running time of the pseudo-Boolean solver by up to three orders of magnitude.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Reward Potentials for Planning with Learned Neural Network Transition Models

Learning Logic Programs Using Neural Networks by Exploiting Symbolic Invariance

Heuristic Learning in Domain-Independent Planning: Theoretical Analysis and Experimental Evaluation

Keywords

1 Introduction

Automated planning is the reasoning side of acting in Artificial Intelligence [23]. Planning automates the selection and ordering of actions to reach desired states of the world. An automated planning problem represents the real-world dynamics using a model of the world, which can either be manually encoded [7, 13, 14, 20, 24], or learned from data [1, 2, 12, 29]. In this paper, we focus on the latter.

Automated planning with deep neural network (DNN) learned state transition models is a two stage data-driven framework for learning and solving planning problems with unknown state transition models [28]. The first stage of the framework learns the unknown state transition model from data as a DNN. The second stage of the framework plans optimally with respect to the learned DNN model by solving an equivalent mathematical optimization problem (e.g., a mixed-integer programming (MIP) model [28], a 0–1 integer programming (IP) model [25, 26], or a weighted partial maximum satisfiability (WP-MaxSAT) model [25, 26]). In this paper, we focus on the theoretical, mathematical modelling and the experimental aspects of the second stage of the data-driven framework where the learned DNN is a binarized neural network (BNN) [16].

We study the complexity of feasible automated planning with learned BNN transition models under the common assumption that the learned BNN is fully connected, and show that this problem is NP-complete. In terms of mathematical modelling, we propose two new constraint programming (CP) models that are motivated by the work on learning BNNs with CP [33]. We then conduct two sets of experiments for the previous and our new mathematical optimization models for the learned automated problem. In our first set of experiments, we focus on solving the existing learned automated problem instances using off-the-shelf solvers for WP-MaxSAT [6], MIP [17], pseudo-Boolean optimization (PBO) [10] and CP [17]. Our results show that the PBO solver RoundingSat [10] outperforms the existing baselines by one to two orders of magnitude. In our second set of experiments, we focus on the challenging task of solving learned automated planning problems over long planning horizons. Here, we study and test the effect of specialized symmetric reasoning over different time steps of the learned planning problem. Our preliminary results demonstrate that exploiting this symmetry can significantly reduce the overall runtime of the underlying solver (i.e., RoundingSat) by upto three orders of magnitude. Overall, with this paper we make both theoretical and practical contributions to the field of data-driven automated planning with learned BNN transition models.

In the next section we formally define the planning problem using binarized neural network (BNN) transitions functions. In Sect. 3 we define a 0–1 integer programming (IP) model that will solve the planning problem given a learned BNN. In Sect. 4 we show that the feasibility problem is NP-complete. In Sect. 5 we give two constraint programming models for the solving the planning problem. In Sect. 6 we discuss a particular symmetry property of the model, and discuss how to take advantage of it. In Sect. 7 we give experimental results. Finally, in Sect. 8 we conclude and discuss future work.

2 Planning with Learned BNN Transition Models

We begin by presenting the definition of the learned automated planning problem and the BNN architecture used for learning the transition model from data.

2.1 Problem Definition

A fixed-horizon learned deterministic automated planning problem [25, 28] is a tuple $\tilde{\varPi } = \langle S,A,C,\tilde{T},V,G,R,H \rangle $, where $S=\{s_1, \dots ,s_{n}\}$ and $A=\{a_1, \dots ,a_{m}\}$ are sets of state and action variables for positive integers $n, m$ with domains $D_{s_1}, \dots , D_{s_{n}} $ and $D_{a_1}, \dots , D_{a_{m}}$ respectively, $C: D_{s_1} \times \dots \times D_{s_{n}} \times D_{a_1} \times \dots \times D_{a_{m}} \rightarrow \{ true , false \} $ is the global function, $\tilde{T}: D_{s_1} \times \dots \times D_{s_{n}} \times D_{a_1} \times \dots \times D_{a_{m}} \rightarrow D_{s_1} \times \dots \times D_{s_{n}}$ denotes the learned state transition function, and $R: D_{s_1} \times \dots \times D_{s_{n}} \times D_{a_1} \times \dots \times D_{a_{m}} \rightarrow \mathbb {R}$ is the reward function. Further, V is a tuple of constants $\langle V_1,\dots , V_{n} \rangle \in D_{s_1} \times \dots \times D_{s_{n}}$ that denotes the initial values of all state variables, $G: D_{s_1} \times \dots \times D_{s_{n}} \rightarrow \{ true , false \} $ is the goal state function, and $H\in \mathbb {Z}^{+}$ is the planning horizon.

A solution to (i.e., a plan for) $\tilde{\varPi }$ is a tuple of values $\bar{A}^{t} = \langle \bar{a}^{t}_{1}, \dots , \bar{a}^{t}_{m} \rangle \in D_{a_1} \times \dots \times D_{a_{m}}$ for all action variables A over time steps $t\in \{1,\dots , H\}$ such that $\tilde{T}(\langle \bar{s}^{t}_{1}, \dots , \bar{s}^{t}_{n}, \bar{a}^{t}_{1}, \dots , \bar{a}^{t}_{m} \rangle ) = \langle \bar{s}^{t+1}_{1}, \dots , \bar{s}^{t+1}_{n} \rangle $ and $C(\langle \bar{s}^{t}_{1}, \dots , \bar{s}^{t}_{n}, \bar{a}^{t}_{1}, \dots , \bar{a}^{t}_{m} \rangle ) = true $ for time steps $t\in \{1,\dots , H\}$, $V_i = \bar{s}^{1}_{i}$ for all $s_i \in S$ and $G(\langle \bar{s}^{H+1}_{1}, \dots , \bar{s}^{H+1}_{n} \rangle ) = true $. An optimal solution to $\tilde{\varPi }$ is a solution such that the total reward $\sum _{t = 1}^{H} R(\langle \bar{s}^{t+1}_{1}, \dots , \bar{s}^{t+1}_{n}, \bar{a}^{t}_{1}, \dots , \bar{a}^{t}_{m} \rangle )$ is maximized.

It is assumed that the functions C, G, R and $\tilde{T}$ are known, that C, G can be equivalently represented by a finite set of linear constraints, that R is a linear expression and that $\tilde{T}$ is a learned binarized neural network [16]. Next, we give an example planning problem where these assumptions are demonstrated.

Example 1

A simple instance of a learned automated planning problem $\tilde{\varPi }$ is as follows.

The set of state variables is defined as $S = \{s_1\}$ where $s_1 \in \{0,1\}$.
The set of action variables is defined as $A = \{a_1\}$ where $a_1 \in \{0,1\}$.
The global function C is defined as $C(\langle s_1, a_1 \rangle ) = true $ when $s_1 + a_1 \le 1$.
The value of the state variable $s_1$ is $V_1 = 0$ at time step $t=1$.
The goal state function G is defined as $G(\langle s_1 \rangle ) = true $ if and only if $s_1 = 1$.
The reward function R is defined as $R(\langle s_1, a_1 \rangle ) = -a_1$.
The learned state transition function $\tilde{T}$ is in the form of a BNN, which will be described below.
A planning horizon of $H = 4$.

A plan (assuming the BNN described later in Fig. 1) is $\bar{a}_{1}^{1} = 1$, $\bar{a}_{1}^{2} = 1$, $\bar{a}_{1}^{3} = 1$, $\bar{a}_{1}^{4} = 0$ with corresponding states $\bar{s}_{1}^{1} = 0$, $\bar{s}_{1}^{2} = 0$, $\bar{s}_{1}^{3} = 0$, $\bar{s}_{1}^{4} = 0$, $\bar{s}_{1}^{5} = 1$. The total reward for the plan is $-3$. $\square $

2.2 Binarized Neural Networks

Binarized neural networks (BNNs) are neural networks with binary weights and activation functions [16]. As a result, BNNs can learn memory-efficient models by replacing most arithmetic operations with bit-wise operations. The fully-connected BNN that defines the learned state transition function $\tilde{T}$, given L layers with layer width ${W_l}$ in layer $l \in \{1,\dots ,L\}$, and a set of neurons $J(l)=\{u_{1,l}, \dots ,u_{W_l,l}\}$, is stacked in the following order.

Input Layer. The first layer consists of neurons $u_{i,1}\in J(1)$ that represent the domain of the learned state transition function $\tilde{T}$. We will assume that the domains of action and state variables are binary, and let neurons $u_{1,1}, \dots , u_{n,1} \in J(1)$ represent the state variables S and neurons $u_{n+1,1}, \dots ,u_{n+m,1} \in J(1)$ represent the action variables A. During the training of the BNN, binary values 0 and 1 of action and state variables are represented by $-1$ and 1, respectively.

Batch Normalization Layers. For layers $l \in \{2,\dots ,L\}$, Batch Normalization [18] transforms the weighted sum of outputs at layer $l-1$ in $\triangle _{j,l} = \sum _{i\in J(l-1)} w_{i,j,l} y_{i,l-1}$ to inputs $x_{j,l}$ of neurons $u_{j,l} \in J(l)$ using the formula $x_{j,l} = \frac{\triangle _{j,l} - \mu _{j,l}}{\root \of {\sigma ^{2}_{j,l} + \epsilon _{j,l}}}\gamma _{j,l} + \beta _{j,l}$, where $y_{i,l-1}$ denotes the output of neuron $u_{i,l-1}\in J(l-1)$, and the parameters are the weight $w_{i,j,l}$, input mean $\mu _{j,l}$, input variance $\sigma ^{2}_{j,l}$, numerical stability constant $\epsilon _{j,l}$, input scaling $\gamma _{j,l}$, and input bias $\beta _{j,l}$, all computed at training time.

Activation Layers. Given input $x_{j,l}$, the deterministic activation function $y_{j,l}$ computes the output of neuron $u_{j,l}\in J(l)$ at layer $l\in \{2,\dots ,L\}$, which is 1 if $x_{j,l}\ge 0$ and $-1$ otherwise. The last activation layer consists of neurons $u_{i,L}\in J(L)$ that represent the codomain of the learned state transition function $\tilde{T}$. We assume neurons $u_{1,L}, \dots ,u_{n,L} \in J(L)$ represent the state variables S.

The proposed BNN architecture is trained to learn the function $\tilde{T}$ from data that consists of measurements on the domain and codomain of the unknown state transition function $T: D_{s_1} \times \dots \times D_{s_{n}} \times D_{a_1} \times \dots \times D_{a_{m}} \rightarrow D_{s_1} \times \dots \times D_{s_{n}}$. An example learned BNN for the problem of Example 1 is visualized in Fig. 1.

3 0–1 Integer Programming Model for the Learned Planning Problem

In this section, we present the 0–1 integer programming (IP) model from [25, 26] previously used to solve learned automated planning problems. A 0–1 IP model can be solved optimally by a mixed-integer programming (MIP) solver (as was previously investigated [25, 26]). Equivalently, this can be viewed as a pseudo-Boolean optimization (PBO) model to be solved using a PBO solver, since all the variables are 0–1 or equivalently Boolean.

Decision Variables. The 0–1 IP model uses the following decision variables:

${X}_{i,t}$ encodes whether action $a_i\in A$ is executed at time step $t\in \{1,\dots ,H\}$ or not.
${Y}_{i,t}$ encodes whether we are in state $s_i\in S$ at time step $t\in \{1,\dots ,H+1\}$ or not.
${Z}_{i,l,t}$ encodes whether neuron $u_{i,l}\in J(l)$ in layer $l\in \{1,\dots ,L\}$ is activated at time step $t\in \{1,\dots ,H\}$ or not.

Parameters. The 0–1 IP model uses the following parameters:

$\bar{w}_{i,j,l}$ is the value of the learned BNN weight between neurons $u_{i,l-1}\in J(l-1)$ and $u_{j,l}\in J(l)$ in layer $l\in \{2,\dots ,L\}$.
B(j, l) is the bias of neuron $u_{j,l}\in J(l)$ in layer $l\in \{2,\dots ,L\}$. Given the values of normalization parameters $\bar{\mu }_{j,l}$, $\bar{\sigma }^{2}_{j,l}$, $\bar{\epsilon }_{j,l}$, $\bar{\gamma }_{j,l}$ and $\bar{\beta }_{j,l}$, the bias is computed as $B(j,l) = \biggl \lceil \frac{\bar{\beta }_{j,l} \root \of {\bar{\sigma }^{2}_{j,l} + \bar{\epsilon }_{j,l}}}{\bar{\gamma }_{j,l}} - \bar{\mu }_{j,l} \biggr \rceil $. The visualization of the calculation of the bias B(j, l) is presented in Fig. 2.

Constraints. The 0–1 IP model has the following constraints:

$$\begin{aligned}&{Y}_{i,1} = V_{i}&\forall _{s_i \in S} \end{aligned}$$

(1)

$$\begin{aligned}&G(\langle {Y}_{1,H+1}, \dots , {Y}_{n,H+1} \rangle ) = true&\end{aligned}$$

(2)

$$\begin{aligned}&C(\langle {Y}_{1,t}, \dots , {Y}_{n,t}, {X}_{1,t}, \dots , {X}_{m,t} \rangle ) = true&\forall _{t\in \{1,\dots ,H\}} \end{aligned}$$

(3)

$$\begin{aligned}&{Y}_{i,t} = {Z}_{i,1,t}&\forall _{s_i \in S, t\in \{1,\dots ,H\}} \end{aligned}$$

(4)

$$\begin{aligned}&{X}_{i,t} = {Z}_{i+n,1,t}&\forall _{a_i \in A, t\in \{1,\dots ,H\}} \end{aligned}$$

(5)

$$\begin{aligned}&{Y}_{i,t+1} = {Z}_{i,L,t}&\forall _{s_i \in S, t\in \{1,\dots ,H\}} \end{aligned}$$

(6)

$$\begin{aligned}&\bigl ( B(j,l) - |J(l \!-\! 1)| \bigr ) \bigl ( 1 \!-\! {Z}_{j,l,t} \bigr ) \le In (j,l,t)&\forall _{u_{j,l} \in J(l), l\in \{2,\dots ,L\}, t\in \{1,\dots ,H\}} \end{aligned}$$

(7)

$$\begin{aligned}&\bigl ( B(j,l) \!+\! |J(l \!-\! 1)| \!+\! 1 \bigr ) {Z}_{j,l,t} - 1 \ge In (j,l,t)&\forall _{u_{j,l} \in J(l), l\in \{2,\dots ,L\}, t\in \{1,\dots ,H\}} \end{aligned}$$

(8)

where the input expression $ In (j,l,t)$ for neuron $u_{j,l}\in J(l)$ in layer $l\in \{2,\dots ,L\}$ at time step $t\in \{1,\dots ,H\}$ is equal to $\sum _{u_{i,l-1}\in J(l-1)}{\bar{w}_{i,j,l}} (2 \cdot {Z}_{i,l-1,t} - 1) + B(j,l)$. In the above model, constraints (1) set the initial value of every state variable. Constraints (2)–(3) enforce the global constraints (i.e., constraints representing C) and the goal constraints (i.e., constraints representing G). Constraints (4)–(6) map the input and output layers of the learned BNN to the corresponding state and action variables. Finally, constraints (7)–(8) model the activation of each neuron in the learned BNN, where the decision variable $Z_{j,l,t} \in \{0,1\}$ represents the output of neuron $u_{j,l} \in J(l)$ at time step $t\in \{1,\dots ,H\}$ using the expression $(2 \cdot {Z}_{j,l,t} - 1)\in \{-1,1\}$.

Objective Function. The 0–1 IP model has the objective function

$$\begin{aligned}&\max \; \sum _{t = 1}^{H} R(\langle {Y}_{1,t+1}, \dots , {Y}_{n,t+1}, {X}_{1,t}, \dots , {X}_{m,t} \rangle ), \end{aligned}$$

(9)

which maximizes the total reward accumulated over time steps $t\in \{1,\dots ,H\}$.

Example 2

The 0–1 IP (or the equivalent PBO) model that is presented in this section can be solved to find an optimal plan to the instance that is described in Example 1. The optimal plan is $\bar{a}^{t}_{1} = 0$ for all time steps $t\in \{1,2,3,4\}$, and the total reward for the optimal plan is 0. $\square $

4 Theoretical Results

In this section, we establish the NP-completeness of finding feasible solutions to learned planning problems.

Theorem 1

Finding a feasible solution to a learned planning problem $\tilde{\varPi }$ with a fully-connected batch normalized learned BNN $\tilde{T}$ is an NP-complete problem.

Proof

We begin by showing that $\tilde{\varPi }$ is in NP. Given the values $\bar{A}^{t}$ of action variables A for all time steps $t\in \{1,\dots ,H\}$ and the initial values $V_i$ of state variables $s_i \in S$, the learned BNN $\tilde{T}$ can predict the values $\bar{S}^{t} = \langle \bar{s}^{t}_{1}, \dots , \bar{s}^{t}_{n} \rangle \in D_{s_1}\times \dots \times D_{s_{n}}$ of all state variables S for all time steps $t\in \{2,\dots , H+1\}$ in linear time in the size of the BNN and the value of the planning horizon H.

We proceed by showing that $\tilde{\varPi }$ is in NP-hard by a reduction from 3-SAT. Let $\phi $ be a 3-CNF formula such that $\phi = \bigwedge _{j=1}^{q} c_j$ for some positive integer q. Further let $z_1, \dots , z_{r}$ denote the (Boolean) variables that appear in the formula $\phi $ for some positive integer r. As visualized in Fig. 3, we define the learned planning problem $\tilde{\varPi }$ to represent any 3-CNF formula $\phi $ as follows:

1.
Planning horizon $H=1$.
2.
State variable $S=\{s_1\}$.
3.
Action variables $A=\{a_1, \dots , a_{2r}\}$.
4.
The global function C is true if and only if $a_{2i-1} = a_{2i}$ for all $i \in \{1,\dots , r\}$.
5.
Neurons $J(1) = \{u_{1,1}, \dots , u_{1+2r,1}\}$ in the first layer.
6.
Neurons $J(2) = \{u_{1,2}, \dots , u_{q,2}\}$ in the second layer. Each neuron $u_{i,2} \in J(2)$ is normalized so that $B(i,2) = 3$.
7.
Set the learned weights between neurons $u_{2i,1}, u_{2i+1,1} \in J(1) \backslash u_{1,1}$ and $u_{j,2} \in J(2)$ according to the following rules. (a) If $z_i$ does not appear in clause $c_j$, set $\bar{w}_{2i,j,2}=1, \bar{w}_{2i+1,j,2}=-1$, (b) else if the negation of $z_i$ appears in clause $c_j$ (i.e., $\lnot z_i$), set $\bar{w}_{2i,j,2}=\bar{w}_{2i+1,j,2}=-1$, (c) else, set $\bar{w}_{2i,j,2}=\bar{w}_{2i+1,j,2}=1$.
8.
Neuron $J(3) = \{u_{1,3}\}$ in the third layer. Neuron $u_{1,3}$ is normalized such that $B(1,3) = -q$.
9.
Set the learned weights $\bar{w}_{i,1,3}=1$ between $u_{i,2} \in J(2)$ and $u_{1,3} \in J(3)$.
10.
The goal state function G is defined as $G(\langle 1 \rangle ) = true $ and $G(\langle 0 \rangle ) = false $.

In the reduction presented above, step (1) sets the value of the planning horizon H to 1. Step (2) defines a single state variable $s_1$ to represent whether the formula $\phi $ is satisfied (i.e., $s_1=1$) or not (i.e., $s_1=0$). Step (3) defines action variables $a_1, \dots , a_{2r}$ to represent the Boolean variables $z_1, \dots , z_{r}$ in the formula $\phi $. Step (4) ensures that the pairs of action variables $a_{2i-1}, a_{2i}$ that represent the same Boolean variable $z_i$ take the same value. Step (5) defines the neurons in the first layer (i.e., $l=1$) of the BNN. Step (6) defines the neurons in the second layer (i.e., $l=2$) of the BNN. Each neuron $u_{i,2} \in J(2)$ represents a clause $c_i$ in the formula $\phi $, and the input of each neuron $u_{i,2}$ is normalized so that $B(i,2) = 3$. Step (7) defines the weights between the first and the second layers so that the output of neurons $u_{2i,1}, u_{2i+1,1} \in J(1)$ only affects the input of the neurons $u_{j,2} \in J(2)$ in the second layer if and only if the Boolean variable $z_i$ appears in clause $c_j$. When this is not the case, the output of neurons $u_{2i,1}, u_{2i+1,1}$ are cancelled out due to the different values of their weights, so that $\bar{w}_{2i,j,2} + \bar{w}_{2i+1,j,2} = 0$. Steps (6) and (7) together ensure that for any values of $V_1$ and $\bar{w}_{1,j,2}$, neuron $u_{j,2} \in J(2)$ is activated if and only if at least one literal in clause $c_j$ evaluates to true.^{Footnote 1} Step (8) defines the single neuron in the third layer (i.e., $l=3$) of the BNN. Neuron $u_{1,3} \in J(3)$ predicts the value of state variable $s_1$. Step (9) defines the weights between the second and the third layers so that the neuron $u_{1,3} \in J(3)$ activates if and only if all clauses in the formula $\phi $ are satisfied. Finally, step (10) ensures that the values of the actions constitute a solution to the learned planning problem $\tilde{\varPi }$ if and only if all clauses are satisfied. $\square $

5 Constraint Programming Models for the Learned Planning Problem

In this section, we present two new constraint programming (CP) models to solve the learned automated planning problem $\tilde{\varPi }$. The models make use of reification rather than restricting themselves to linear constraints. This allows a more direct expression of the BNN constraints.

5.1 Constraint Programming Model 1

Decision Variables and Parameters. The CP model 1 uses the same set of decision variables and parameters as the 0–1 IP model previously described in Sect. 3.

Constraints. The CP model 1 has the following constraints:

$$\begin{aligned}&\text {Constraints }(1){-}(6)&\nonumber \\&( In (j,l,t) \ge 0) = {Z}_{j,l,t}&\forall _{u_{j,l} \in J(l), l\in \{2,\dots ,L\}, t\in \{1,\dots ,H\}} \end{aligned}$$

(10)

where the input expression $ In (j,l,t)$ for neuron $u_{j,l}\in J(l)$ in layer $l\in \{2,\dots ,L\}$ at time step $t\in \{1,\dots ,H\}$ is equal to $\sum _{u_{i,l-1}\in J(l-1)}{\bar{w}_{i,j,l}} (2 \cdot {Z}_{i,l-1,t} - 1) + B(j,l)$. In the above model, constraint (10) models the activation of each neuron in the learned BNN by replacing constraints (7)–(8).

Objective Function. The CP model 1 uses the same objective function as the 0–1 IP model previously described in Sect. 3.

5.2 Constraint Programming Model 2

Decision Variables. The CP model 2 uses the $X_{i,t}$ and $Y_{i,t}$ decision variables previously described in Sect. 3.

Parameters. The CP model 2 uses the same set of parameters as the 0–1 IP model previously described in Sect. 3.

Constraints. The CP model 2 has the following constraints:

$$\begin{aligned}&\text {Constraints }(1){-}(6)&\nonumber \\&( In (j,l,t) \ge 0) = Expr _{j,l,t}&\forall _{u_{j,l} \in J(l), l\in \{2,\dots ,L\}, t\in \{1,\dots ,H\}} \end{aligned}$$

(11)

where the input expression $ In (j,l,t)$ for neuron $u_{j,l}\in J(l)$ in layer $l\in \{2,\dots ,L\}$ at time step $t\in \{1,\dots ,H\}$ is equal to $\sum _{u_{i,l-1}\in J(l-1)}{\bar{w}_{i,j,l}} ( 2 \cdot Expr _{i,l-1,t} - 1) + B(j,l)$, and output expression $ Expr _{j,l,t}$ represents the binary output of neuron $u_{j,l}\in J(l)$ in layer $l\in \{2,\dots ,L\}$ at time step $t\in \{1,\dots ,H\}$. In the above model, constraint (11) models the activation of each neuron in the learned BNN by replacing the decision variable ${Z}_{j,l,t}$ in constraint (10) with the expression $ Expr _{j,l,t}$ The difference between an integer variable and an expression is that during solving the solver does not store the domain (current set of possible values) for an expression. Expressions allow more scope for the presolve of CP Optimizer [17] to rewrite the constraints to a more suitable form, and allow the use of more specific propagation scheduling.

Objective Function. The CP model 2 uses the same objective function as the 0–1 IP model that is previously described in Sect. 3.

6 Model Symmetry

Examining the 0–1 IP (or equivalently the PBO) model, one can see the bulk of the model involves copies of the learned BNN constraints over all time steps. These constraints model the activation of each neuron (i.e., constraints (4)–(8)) and constrain the input of the BNN (i.e., constraint (3)). The remainder of the model is constraints on the initial and goal states (i.e., constraints (1)–(2)). So if we ignore the initial and goal state constraints, the model is symmetric over the time steps. Note that this symmetry property is not a global one: the model is not symmetric as a whole. Rather, this local symmetry arises because subsets of constraints are isomorphic to each other. Because of this particular form of symmetry, the classic approach of adding symmetry breaking predicates [5] would not be sound, as this requires global symmetry.

Instead, we exploit this symmetry by deriving symmetric nogoods on-the-fly. If a nogood is derived purely from constraints (3)–(8), and if a sufficiently small subset of time steps was involved in its derivation, then we can shift the time steps of this nogood over the planning horizon, learning a valid symmetric nogood. To track which constraints a nogood is derived from, we use the SAT technique of marker literals lifted to PBO. Each constraint is extended with some marker literal, which, if true, enforces the constraint, and if false, trivially satisfies it. During the search, these marker literals are a priori assumed true, so we are solving essentially the same problem, but the nogood learning mechanism of the PBO solver ensures the marker literal of a constraint appears in a nogood if that constraint was required in the derivation of the nogood.

By introducing marker literals $L_t$ for all time steps $t \in \{1,\dots , H\}$ for constraints (3)–(8) and an extra “asymmetric” marker literal $L^*$ for constraint (2) and the constraints originating from bounding the objective function, and then treating all initial state constraints as markers, we can track if only constraints (3)–(8) were involved in the derivation of a nogood, and if so, for which time steps. When we find that the constraints involved in creating a nogood only refer to constraints from time steps l to u, then we know that symmetric copies of these nogoods are also valid for time steps ${l + \varDelta }$ to ${u + \varDelta }$ for all $-l< \varDelta <0$, $0<\varDelta \le H - u$. Our approach to exploiting symmetry is similar to the ones proposed for bounded model checking in SAT [30, 31].

Example 3

Marker literals are used to “turn on” the constraints and they are set to true throughout the search. For example constraint (8) becomes

$$ L_t \rightarrow (B(j,l)+J(l-1)+1){Z}_{j,l,t} - 1 \ge In (j,l,t) \; \forall _{u_{j,l} \in J(l), l\in \{2,\dots ,L\}, t\in \{1,\dots ,H\}} $$

or equivalently, the binary linear constraint

$$ \mathbf{M} (1{-}L_t) + (B(j,l) + J(l{-}1)+1){Z}_{j,l,t}{ -}1 \ge In (j,l,t) \; \forall _{u_{j,l} \in J(l), l\in \{2,\dots ,L\}, t\in \{1,\dots ,H\}} $$

with $\mathbf{M} $ chosen large enough so that the constraint trivially holds if $L_t = 0$. $\square $

We consider two ways of symmetric nogood derivation:

All: whenever we discover a nogood that is a consequence of constraints from time steps l to u, we add a suitably renamed copy of the nogood to the variables for time steps ${l+\varDelta }$ to ${u + \varDelta }$ for all $-l< \varDelta <0$, $0<\varDelta \le H - u$,
Propagate: we consider each possible $\varDelta $ above, but only add the renamed nogood if it will immediately propagate or fail, similar to a SAT symmetric clause learning heuristic [8].

Finally, we denote Base as the version of RoundingSat that does not add the symmetric nogoods.

Example 4

Consider the problem in Example 1. Assume we generate a nogood $X_{1,1} \vee Y_{1,1} \vee \lnot X_{1,2}$, which is a consequence only of constraints for the BNNs for time steps 1 and 2. The actual generated nogood is then $\lnot L_1 \vee \lnot L_2 \vee X_{1,1} \vee Y_{1,1} \vee \lnot X_{1,2}$ which illustrates that it depends only on the constraints in time steps 1 and 2. We can then add a symmetric copy $\lnot L_2 \vee \lnot L_3 \vee X_{1,2} \vee Y_{1,2} \vee \lnot X_{1,3}$ for time steps 2 and 3, as well as $\lnot L_3 \vee \lnot L_4 \vee X_{1,3} \vee Y_{1,3} \vee \lnot X_{1,4}$ for steps 3 and 4. These new constraints must be correct, since the BNN constraints for time steps $t\in \{1,2,3,4\}$ are all symmetric. The marker literals are added so that later nogoods making use of these nogoods also track which time steps were involved in their generation. Using All we add both these nogoods, using Propagate we only add those that are unit or false in the current state of the solver. $\square $

7 Experimental Results

In this section, we present results on two sets of computational experiments. In the first set of experiments, we compare different approaches to solving the learned planning problem $\tilde{\varPi }$ with mathematical optimization models. In the second set of experiments, we present preliminary results on the effect of deriving symmetric nogoods when solving $\tilde{\varPi }$ over long horizons H.

7.1 Experiments 1

We first experimentally test the runtime efficiency of solving the learned planning problem $\tilde{\varPi }$ with mathematical optimization models using off-the-shelf solvers. All the existing benchmark instances of the learned planning problem $\tilde{\varPi }$ (i.e., 27 in total) were used [26]. We ran the experiments on a MacBookPro with 2.8 GHz Intel Core i7 16GB memory, with one hour total time limit per instance. We used the MIP solver CPLEX 12.10 [17] to optimize the 0–1 IP model, MaxHS [6] with underlying CPLEX 12.10 linear programming solver to optimize the WP-MaxSAT model [26], and CP Optimizer 12.10 [17] to optimize the CP Model 1 (CP1) and the CP Model 2 (CP2). Finally, we optimized a pseudo-Boolean optimization (PBO) model, which simply replaces all binary variables in the 0–1 IP model with Boolean variables, using RoundingSat [10].

In Fig. 4, we visualize the cumulative number of problems solved by all five models, namely: IP (blue), WP-MaxSAT (red), CP1 (green), CP2 (black) and PBO (orange), over 27 instances of the learned planning problem $\tilde{\varPi }$ within one hour time limit. Figure 4 clearly highlights the experimental efficiency of solving the PBO model. We find that using the PBO model with RoundingSat solves all existing benchmarks under 1000 s. In contrast, we observe that the 0–1 IP model performs poorly, with only 19 instances out of 27 solved within the one hour time limit. The remaining three models, WP-MaxSAT, CP1 and CP2, demonstrate relatively comparable runtime performance, which we explore in more detail next.

In Figs. 5a, 5b and 5c, we present scatter plots comparing the WP-MaxSAT, CP1 and CP2 models. In each figure, each dot (red) represents an instance of the learned planning problem $\tilde{\varPi }$ and each axis represents a model (i.e., WP-MaxSAT, CP1 or CP2). If a dot falls below the diagonal line (blue), it means the corresponding instance is solved faster by the model represented by the y-axis than the one represented by the x-axis. In Fig. 5a, we compare the two CP models CP1 and CP2. A detailed inspection of Fig. 5a shows a comparable runtime performance on the instances that take less than 1000 s to solve (i.e., most dots fall closely to the diagonal line). In the remaining two instances that are solved by CP2 under 1000 s, CP1 runs out of the one hour time limit. These results suggest that using expressions instead of decision variables to model the neurons of the learned BNN allows the CP solver to solve harder instances (i.e., instances that take more than 1000 s to solve) more efficiently. In Figs. 5b and 5c, we compare CP1 and CP2 against the WP-MaxSAT model, respectively. Both figures show a similar trend on runtime performance; the CP models close instances that take less than 1000 s to solve by one to two orders of magnitude faster than the WP-MaxSAT model, and the WP-MaxSAT model performs comparably to the CP models on the harder instances. Overall, we find that WP-MaxSAT solves one more and one less instances compared to CP1 and CP2 within the one hour time limit, respectively. These results suggest that the WP-MaxSAT model pays a heavy price for the large size of its compilation when the instances take less than 1000 s to solve, and only benefits from its SAT-based encoding for harder instances.

Next, we compare the runtime performance of WP-MaxSAT, CP1 and CP2 against the best performing model (i.e., PBO) in more detail in Figs. 6a, 6b and 6c. These plots show that the PBO model significantly outperforms the WP-MaxSAT, CP1 and CP2 models across all instances. Specifically, Fig. 6a shows that the PBO model is better than the previous state-of-the-art WP-MaxSAT model across all instances by one to two orders of magnitude in terms of runtime performance. Similarly, Figs. 6b and 6c show that the PBO model outperforms both CP models across all instances, except in one and two instances, respectively, by an order of magnitude.

It is interesting that the 0–1 IP model works so poorly for the MIP solver, while the equivalent PBO model is solved efficiently using a PBO solver. It seems that the linear relaxations used by the MIP solver are too weak to generate useful information, and it ends up having to fix activation variables in order to reason meaningfully. In contrast, it appears that the PBO solver is able to determine some useful information from the neuron constraints without necessarily fixing the activation variables—probably since it uses integer-based cutting planes reasoning [4] rather than continuous linear programming reasoning for the linear expressions—and the nogood learning helps it avoid repeated work.

7.2 Experiments 2

We next evaluate the effect of symmetric nogood derivation on solving the learned planning problem $\tilde{\varPi }$ over long horizons H. For these experiments, we generated instances by incrementing the value of the planning horizon H in the benchmark instances in Sect. 7.1, and used the same hardware and time limit settings. We modified the best performing solver RoundingSat [10] to include symmetry reasoning as discussed in Sect. 6.

In Fig. 7, we visualize the cumulative number of problems solved for all three versions of RoundingSat, namely Base (blue), Propagate (red), and All (black), over 21 instances of the learned planning problem $\tilde{\varPi }$ over long horizons H within one hour time limit. Figure 7 demonstrates that symmetric nogood derivation can improve the efficiency of solving the underlying PBO model. We find that Propagate solves the most instances within the time limit. A more detailed inspection of the results further suggests that between the remaining two version of RoundingSat, All solves more instances faster compared to Base.

Next, in Figs. 8a, 8b and 8c, we explore the pairwise runtime comparisons of the three versions of RoundingSat in more detail. In Figs. 8a and 8b, we compare Propagate and All against Base, respectively. It is clear from these scatter plots that Propagate and All outperform Base in terms of runtime performance. Specifically, in Fig. 8b, we find that All outperforms Base by up to three orders of magnitude in terms of runtime performance. Finally, in Fig. 8c, we compare the two versions of RoundingSat that are enhanced with symmetric nogood derivation. A detailed inspection of Fig. 8c reveals that All is slightly faster than Propagate in general.

8 Related Work, Conclusions and Future Work

In this paper, we studied the important problem of automated planning with learned BNNs, and made four important contributions. First, we showed that the feasibility problem is NP-complete. Unlike the proof presented for the task of verifying learned BNNs [3], our proof does not rely on setting weights to zero (i.e., sparsification). Instead, our proof achieves the same expressivity for fully connected BNN architectures, without adding additional layers or increasing the width of the layers, by representing each input with two copies of action variables. Second, we introduced two new CP models for the problem. Third, we presented detailed computational results for solving the existing instances of the problem. Lastly, we studied the effect of deriving symmetric nogoods on solving new instances of the problem with long horizons.

It appears that BNN models provide a perfect class of problems for pseudo-Boolean solvers, since each neuron is modelled by pseudo-Boolean constraints, but the continuous relaxation is too weak for MIP solvers to take advantage of, while propagation-based approaches suffer since they are unable to reason about linear expressions directly. PBO solvers directly reason about integer (0–1) linear expressions, making them very strong on this class of problems.

Our results have the potential to improve other important tasks with learned BNNs (and other DNNs), such as automated planning in real-valued action and state spaces [27, 35, 36], decision making in discrete action and state spaces [21], goal recognition [11], training [33], verification [9, 15, 19, 22], robustness evaluation [32] and defenses to adversarial attacks [34], which rely on efficiently solving similar problems that we solve in this paper. The derivation of symmetric nogoods is a promising avenue for future work, in particular, if a sufficient number of symmetric nogoods can be generated. Relating the number of derived symmetric nogoods to the wall-clock speed-up of the solver or the reduction of the search tree might shed further light on the efficacy of this approach.

Notes

1.
Each neuron $u_{j,2}$ that represents clause $c_j$ receives seven non-zero inputs (i.e., one from state and six from action variables). The bias B(j, 2) is set so that the activation condition holds when at least one literal in clause $c_j$ evaluates to true. For example, the constraint $-2\,+\,\bar{w}_{1,j,2}V_1 + B(j,2) \ge 0$ represents the case when exactly one literal in clause $c_j$ evaluates to true where the terms $-2$ and $\bar{w}_{1,j,2}V_1$ represent the inputs from the six action variables and the single state variable, respectively. Similarly, the constraint $-6 + \bar{w}_{1,j,2}V_1 + B(j,2) < 0$ represents the case when all literals in clause $c_j$ evaluate to false and the activation condition does not hold.

References

Bennett, S.W., DeJong, G.F.: Real-world robotics: learning to plan for robust execution. Mach. Learn. 23, 121–161 (1996)
Google Scholar
Benson, S.S.: Learning action models for reactive autonomous agents. Ph.D. thesis, Stanford University, Stanford, CA, USA (1997)
Google Scholar
Cheng, C.-H., Nührenberg, G., Huang, C.-H., Ruess, H.: Verification of binarized neural networks via inter-neuron factoring. In: Piskac, R., Rümmer, P. (eds.) VSTTE 2018. LNCS, vol. 11294, pp. 279–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03592-1_16
Chapter Google Scholar
Cook, W., Coullard, C.R., Turán, G.: On the complexity of cutting-plane proofs. Discret. Appl. Math. 18(1), 25–38 (1987)
Article MathSciNet Google Scholar
Crawford, J., Ginsberg, M., Luks, E., Roy, A.: Symmetry-breaking predicates for search problems. In: Proceedings of the Fifth International Conference on Principles of Knowledge Representation and Reasoning, pp. 148–159. Morgan Kaufmann (1996)
Google Scholar
Davies, J., Bacchus, F.: Solving MAXSAT by solving a sequence of simpler SAT instances. In: Lee, J. (ed.) CP 2011. LNCS, vol. 6876, pp. 225–239. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23786-7_19
Chapter Google Scholar
Davies, T.O., Pearce, A.R., Stuckey, P.J., Lipovetzky, N.: Sequencing operator counts. In: Proceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling, pp. 61–69. AAAI Press (2015)
Google Scholar
Devriendt, J., Bogaerts, B., Bruynooghe, M.: Symmetric explanation learning: effective dynamic symmetry handling for SAT. In: Gaspers, S., Walsh, T. (eds.) SAT 2017. LNCS, vol. 10491, pp. 83–100. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66263-3_6
Chapter MATH Google Scholar
Ehlers, R.: Formal verification of piece-wise linear feed-forward neural networks. In: D’Souza, D., Narayan Kumar, K. (eds.) ATVA 2017. LNCS, vol. 10482, pp. 269–286. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68167-2_19
Chapter Google Scholar
Elffers, J., Nordström, J.: Divide and conquer: towards faster pseudo-boolean solving. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 1291–1299 (2018)
Google Scholar
Fraga Pereira, R., Vered, M., Meneguzzi, F., Ramírez, M.: Online probabilistic goal recognition over nominal models. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 5547–5553. International Joint Conferences on Artificial Intelligence Organization (2019)
Google Scholar
Gil, Y.: Acquiring domain knowledge for planning by experimentation. Ph.D. thesis, Carnegie Mellon University, USA (1992)
Google Scholar
Helmert, M.: The fast downward planning system. J. Artif. Intell. Res. 26, 191–246 (2006)
Article Google Scholar
Hoffmann, J., Nebel, B.: The FF planning system: fast plan generation through heuristic search. J. Artif. Intell. Res. 14, 253–302 (2001)
Article Google Scholar
Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_1
Chapter Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Proceedings of the Thirtieth International Conference on Neural Information Processing Systems, NIPS 2016, pp. 4114–4122. Curran Associates Inc., USA (2016)
Google Scholar
IBM: IBM ILOG CPLEX Optimization Studio CPLEX User’s Manual (2020)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the Thirty-Second International Conference on International Conference on Machine Learning, ICML, pp. 448–456. JMLR.org (2015)
Google Scholar
Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5
Chapter Google Scholar
Kautz, H., Selman, B.: Planning as satisfiability. In: Proceedings of the Tenth European Conference on Artificial Intelligence, ECAI 1992, pp. 359–363 (1992)
Google Scholar
Lombardi, M., Gualandi, S.: A Lagrangian propagator for artificial neural networks in constraint programming. Constraints 21, 435–462 (2016)
Article MathSciNet Google Scholar
Narodytska, N., Kasiviswanathan, S., Ryzhyk, L., Sagiv, M., Walsh, T.: Verifying properties of binarized deep neural networks. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 6615–6624 (2018)
Google Scholar
Nau, D., Ghallab, M., Traverso, P.: Automated Planning: Theory & Practice. Morgan Kaufmann Publishers Inc., San Francisco (2004)
MATH Google Scholar
Pommerening, F., Röger, G., Helmert, M., Bonet, B.: LP-based heuristics for cost-optimal planning. In: Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling, ICAPS 2014, pp. 226–234. AAAI Press (2014)
Google Scholar
Say, B., Sanner, S.: Planning in factored state and action spaces with learned binarized neural network transition models. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 4815–4821 (2018)
Google Scholar
Say, B., Sanner, S.: Compact and efficient encodings for planning in factored state and action spaces with learned binarized neural network transition models. Artif. Intell. 285, 103291 (2020)
Article MathSciNet Google Scholar
Say, B., Sanner, S., Thiébaux, S.: Reward potentials for planning with learned neural network transition models. In: Schiex, T., de Givry, S. (eds.) CP 2019. LNCS, vol. 11802, pp. 674–689. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30048-7_39
Chapter Google Scholar
Say, B., Wu, G., Zhou, Y.Q., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, pp. 750–756 (2017)
Google Scholar
Shen, W.M., Simon, H.A.: Rule creation and rule learning through environmental exploration. In: Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, IJCAI 1989, pp. 675–680. Morgan Kaufmann Publishers Inc., San Francisco (1989)
Google Scholar
Shtrichman, O.: Tuning SAT checkers for bounded model checking. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS, vol. 1855, pp. 480–494. Springer, Heidelberg (2000). https://doi.org/10.1007/10722167_36
Chapter Google Scholar
Shtrichman, O.: Pruning techniques for the SAT-based bounded model checking problem. In: Margaria, T., Melham, T. (eds.) CHARME 2001. LNCS, vol. 2144, pp. 58–70. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44798-9_4
Chapter Google Scholar
Tjeng, V., Xiao, K., Tedrake, R.: Evaluating robustness of neural networks with mixed integer programming. In: Proceedings of the Seventh International Conference on Learning Representations. ICLR (2019)
Google Scholar
Toro Icarte, R., Illanes, L., Castro, M.P., Cire, A.A., McIlraith, S.A., Beck, J.C.: Training binarized neural networks using MIP and CP. In: Schiex, T., de Givry, S. (eds.) CP 2019. LNCS, vol. 11802, pp. 401–417. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30048-7_24
Chapter Google Scholar
Wong, E., Kolter, Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: Proceedings of the Thirty-Fifth International Conference on Machine Learning. ICML (2018)
Google Scholar
Wu, G., Say, B., Sanner, S.: Scalable planning with tensorflow for hybrid nonlinear domains. In: Proceedings of the Thirty First Annual Conference on Advances in Neural Information Processing Systems, Long Beach, CA (2017)
Google Scholar
Wu, G., Say, B., Sanner, S.: Scalable planning with deep neural network learned transition models. J. Artif. Intell. Res. 68, 571–606 (2020)
Article Google Scholar

Download references

Acknowledgements

Some of our preliminary computational experiments used resources provided by the Swedish National Infrastructure for Computing (SNIC) at the High Performance Computing Center North (HPC2N) at Umeå University. Jo Devriendt and Jakob Nordström were supported by the Swedish Research Council grant 2016-00782, and Jakob Nordström also received funding from the Independent Research Fund Denmark grant 9040-00389B.

Author information

Authors and Affiliations

Monash University, Melbourne, Australia
Buser Say & Peter J. Stuckey
Lund University, Lund, Sweden
Jo Devriendt & Jakob Nordström
University of Copenhagen, Copenhagen, Denmark
Jo Devriendt & Jakob Nordström

Authors

Buser Say
View author publications
You can also search for this author in PubMed Google Scholar
Jo Devriendt
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Nordström
View author publications
You can also search for this author in PubMed Google Scholar
Peter J. Stuckey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Buser Say .

Editor information

Editors and Affiliations

Insight Centre for Data Analytics, School for Computer Science and Information Technology, University College Cork, Cork, Ireland
Helmut Simonis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Say, B., Devriendt, J., Nordström, J., Stuckey, P.J. (2020). Theoretical and Experimental Results for Planning with Learned Binarized Neural Network Transition Models. In: Simonis, H. (eds) Principles and Practice of Constraint Programming. CP 2020. Lecture Notes in Computer Science(), vol 12333. Springer, Cham. https://doi.org/10.1007/978-3-030-58475-7_53

Download citation

DOI: https://doi.org/10.1007/978-3-030-58475-7_53
Published: 02 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58474-0
Online ISBN: 978-3-030-58475-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics