Active Inference Successor Representations

Millidge, Beren; Buckley, Christopher L.

doi:10.1007/978-3-031-28719-0_11

Beren Millidge^12,13 &
Christopher L. Buckley^12,14

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1721))

Included in the following conference series:

International Workshop on Active Inference

623 Accesses
1 Citations

Abstract

Recent work has uncovered close links between classical reinforcement learning (RL) algorithms, Bayesian filtering, and Active Inference which lets us understand value functions in terms of Bayesian posteriors. An alternative, but less explored, model-free RL algorithm is the successor representation, which expresses the value function in terms of a successor matrix of average future state transitions. In this paper, we derive a probabilistic interpretation of the successor representation in terms of Bayesian filtering and thus design a novel active inference agent architecture utilizing successor representations instead of model-based planning. We demonstrate that active inference successor representations have significant advantages over current active inference agents in terms of planning horizon and computational cost. Moreover, we show how the successor representation agent can generalize to changing reward functions such as variants of the expected free energy.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Bayesian Posterior Updating Algorithm in Reinforcement Learning

Bayes-adaptive hierarchical MDPs

Article 29 January 2016

A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

Article 10 June 2023

1 Introduction

Active Inference (AIF) is an unifying theory of action selection in theoretical neuroscience [15,16,17]. It proposes that action selection, like perception, is fundamentally a problem of inference and that agents select actions by maximizing evidence under a biased generative model [5, 13]. Active inference operates under the aegis of the Bayesian brain hypothesis [8, 19] and free energy principles [1, 4, 10, 11, 14, 26] and possesses several neurobiological process theories [13, 33].

Recent work [29] has uncovered close links between active inference and the framework of control as inference, which shows how many classical reinforcement learning algorithms can be understood as performing Bayesian inference to infer optimal actions [2, 20, 35, 40]. These works, as well as the related duality between control and inference in linearly solvable MDPs [38, 39] has allowed us to understand classical objects in reinforcement learning such as Q-functions and value functions in terms of Bayesian filtering posteriors. Similarly, close connections between active inference and reinforcement learning methods have also been demonstrated [23, 24, 42]. It has been shown that deep active inference agents can be derived that can perform actor-critic algorithms [24] as well as model-based reinforcement learning [9, 41, 42], while the fundamental difference between them has been found to be related to the encoding of value into the generative model [25, 29]. Moreover, it has become obvious that active inference can be understood and applied in a model-free (Bellman-equation) paradigm with simply a distinct reward function (the expected free energy) [6, 24]. However, while much of this work has focused on understanding value functions and model-based RL, another fundamental object in model-free reinforcement learning is the successor representation [7], which has received much less attention. The successor representation (SR) [7] provides an alternative way to estimate value functions. Instead of estimating the value function with the Bellman backup as in temporal difference (TD) learning, a successor matrix of long-term discounted state transitions is estimated instead and then dynamically combined with the reward function to yield the value function for a fixed policy. Compared to estimating the value function directly, the SR requires more memory to store the successor matrix but grants the ability to dynamically recompute the value function as the reward function changes as well as providing a compressed form of a ‘cognitive map’ of the environment which can be directly used for exploration and option discovery [21, 22, 30]. Moreover, from a neuroscientific perspective, the SR has been closely linked to representations in the hippocampus [31, 37] which are concerned with representing abstract (usually spatial) relations [3, 43, 44].

In this work, applying the probabilistic interpretation of the SR, we showcase how the SR can be directly integrated with standard methods in active inference, resulting in the successor-representation active inference (SR-AIF) agent. We show how SR-AIF has significant computational complexity benefits over standard AIF and that, moreover, the explicit generative model in AIF enables the SR to be computed instantly without requiring substantial experience in the environment. Additionally, we show how SR methods can flexibly represent the value function of the EFE and can be used to dynamically trade-off exploration and exploitation at run-time.

2 Active Inference

Discrete-state-space active inference possesses a large literature and several thorough tutorials [5, 13, 36], so we only provide the essentials here. AIF considers agents acting in POMDPs with observations o, states x, and actions u. The agent optimizes over policies $\pi = [u_1, u_2, \dots ]$ which are simply sequences of actions. The agent is typically assumed to be equipped with a generative model of the environment $p(o_{1:T},x_{1:T})$ which describes how observations and states are related over time. This generative model can be factorized into two core components: a likelihood model $p(o_t | x_t)$ which states how observations are generated from states and is represented by a likelihood matrix denoted A, and a transition model $p(x_t | x_{t-1}, u_{t-1})$ which states how states change depending on the previous state and action and is represented by a transition matrix denoted B(u). The rewards or goals of the agents are encoded as strong priors in the generative model and are represented by a ‘goal vector’ denoted C. Since AIF considers agents embedded in a POMDP it has to solve both state inference and action selection problems. State inference is performed using variational inference with a categorical variational distribution q(x) which is obtained by minimizing the variational free energy,

$$\begin{aligned} q^*(x) = \underset{q}{argmin}\ \, \, \mathcal {F} = \underset{q}{argmin}\ \, \, \mathbb {E}_{q(x_t | o_t)}[\log q(x_t | o_t) - \log p(o_t, x_t | x_{t-1}, u_{t-1})] \end{aligned}$$

(1)

AIF uses a unique objective function called the Expected Free Energy (EFE) which combines utility or reward maximization with an information gain term which promotes exploration. AIF agents naturally perform both reward-seeking and information-seeking behaviour [15, 27, 28, 32]. The EFE is defined as,

$$\begin{aligned} \mathcal {G}_t(o_t, x_t)&= \mathbb {E}_{q(o_t, x_t)}[\log q(x_t) - \log \tilde{p}(o_t, x_t |x_{t-1}, u_{t-1})] \nonumber \\&=\underbrace{\mathbb {E}_{q(o_t,x_t)}[\log \tilde{p}(o_t)]}_{\text {Expected Utility}} - \underbrace{\mathbb {E}_{q(o_t)}\big [KL[q(x_t | o_t)||q(x_t)]\big ]}_{\text {Expected Information Gain}} \end{aligned}$$

(2)

where $\tilde{p}$ is a ‘biased’ generative model which contains the goal prior vector C. As can be seen, the EFE can be decomposed into a reward-seeking and exploratory component which underlies the flexible uncertainty-reducing behaviour of AIF agents. To select actions, AIF samples from the prior over policies $q(\pi )$ which is defined as the softmax over the path integral of the EFE into the future for each timestep. Typically, future policies are evaluated up to a time horizon T. This path integral can be expressed as,

$$\begin{aligned} q(\pi ) = \sigma (\sum _t^T \mathcal {G}^\pi _t(o_t, x_t)) \end{aligned}$$

(3)

where $\sigma (x) = \frac{e^{-x}}{\sum _x e^{-x}}$ is the softmax function. Evaluating this path integral exactly for each policy is typically extremely computationally expensive and has exponential complexity due to the exponentially branching number of possible futures to be evaluated. This causes AIF agents to run slowly in practice and has encouraged research into alternative ‘deep’ active inference agents which estimate this path integral in other more efficient (but only approximate) ways [5, 12, 13] Here, we present a novel approach based on successor representations.

3 Successor Representation

The Successor Representation [7] provides an alternative way to compute the value function of a state. Instead of directly learning the value function (or Q function), for instance by temporal difference (TD) learning, the successor representation learns the successor matrix, which is the discounted long term sum of expected state occupancies, from which the value function can be dynamically computed by simply multiplying the successor matrix with the reward function. This allows the SR to instantly adapt behaviour to changing reward functions online without explicit model-based planning.

The Value function, $\mathcal {V}$, can be defined as the expected long term sum of rewards,

$$\begin{aligned} \mathcal {V}^\pi (x)&= r(x) + \gamma B^\pi \mathcal {V}^\pi (x) \nonumber \\&= r(x) + \gamma B^\pi [r(x) + \gamma B^\pi [r(x) + B^\pi [ \cdots ]]] \end{aligned}$$

(4)

where we assume a fixed policy $\pi $ and transition matrix B and a scalar discount rate $0 \le \gamma \le 1$. For a fixed policy the Bellman equation is linear and we can rearrange its as,

$$\begin{aligned} \mathcal {V}^\pi (x)&= r(x) + \gamma B^\pi [r(x) + \gamma B^\pi [r(x) + \gamma B^\pi [ \cdots ]]] \nonumber \\&= (I + \gamma B^\pi + \gamma ^2 B^\pi B^\pi + \cdots )r(x) = M^\pi r(x) \end{aligned}$$

(5)

where $M^\pi $ is the successor matrix and can be thought of as encoding the long-run probability that state x transitions to state $x'$.

4 Successor Representation as Inference

In a special class known as linearly solvable MDPs there is a general duality between control and inference [38, 39] such that control can be cast as a Bayesian filtering problem where the value function corresponds to the Bayesian filtering posterior. To see this, consider the optimal Bellman equation,

$$\begin{aligned} \mathcal {V}^*(x_t) = \underset{u}{argmax}\big [ r(x_t) + c(u) + \mathbb {E}_{p(x_{t+1} | x_t,u)}[\gamma \mathcal {V}^*(x_{t+1})] \big ] \end{aligned}$$

(6)

where we have added an additional control cost c(u). The fundamental challenge is the nonlinearity of the Bellman equation due to the argmax operation. [39] noticed that if the dynamics are completely controllable and set by the action $p(x_{t+1} | x_t, u) = u( x_t)$ and the control cost is set to $KL[u(x_t)||p(x_{t+1} | x_t, u)]$ which penalizes divergence from the prior dynamics, then the argmax is analytically solvable. By defining the ‘desirability function’ $z(x) = e^{-\mathcal {V}^*(x)}$ and exponentiating, we can obtain a linear equation in z,

$$\begin{aligned} z(x) = e^{-r(x)}\mathbb {E}_{p(x_{t+1}| x_t)}[\gamma z(x_{t+1})] \end{aligned}$$

(7)

which can be solved easily. Crucially, however, this equation takes the same form as the Bayesian filtering recursion $p(x_t | o_t) \propto p(o_t | x_t)\mathbb {E}_{p(x_t | x_{t-1})}\big [p(x_{t-1} | o_{t-1})\big ]$ when we make the identification of the ‘desirability’ $z(x_t)$ with the posterior $p(x_t | o_t)$ and the exponentiated reward $e^{-r(x_t)}$ with the likelihood $p(o_t | x_t)$. Interestingly, this same relationship between exponentiated reward and probability is also used heuristically in the control as inference literature [20]. An additional subtle point is that control is about the future instead of the past so the sum telescopes forward instead of backwards in time. By factoring Eq. 6 as in Eq. 5, it is straightforward to observe that,

$$\begin{aligned} M = \sum _{\tau =t}^T \gamma ^\tau \Big ( \prod _{i=t}^T \sum _{x_i} p(x_i | x_{i-1})\Big ) = \sum _{\tau = t}^T \gamma ^\tau p(x_\tau | x_t) \end{aligned}$$

(8)

In effect, we can think of M as representing the discounted sum of the probabilities of all the possible times to reach state x over the time horizon. A similar and novel result can be derived for the case of general (not linearly solvable) MDPs but with a fixed policy except here we derive an upper bound on the SR instead of an equality. We begin by taking the log of the backwards Bayesian filtering posterior and then repeatedly applying Jensen’s inequality to obtain,

$$\begin{aligned} \log p(x_t | o_t)&= \log p(o_t | x_t) + \log \mathbb {E}_{p(x_{t+1} | x_t)}[\log p(x_{t+1} | o_{t+1})] \nonumber \\&\le \log p(o_t | x_t) + \mathbb {E}_{p(x_{t+1} | x_t)} \Big [ \big [ \log p(o_t | x_t) + \mathbb {E}_{p(x_{t+2} | x_{t+1})}[ \log \dots ] \big ] \Big ] \end{aligned}$$

(9)

Which has the same recursive structure as the linear Bellman Equation for a fixed policy (Eq. 4) so long as we maintain the equivalence between the value function and the log posterior and the reward and the log likelihood. The technique in Eq. 5 can then be applied to give the same probabilistic interpretation of the SR as Eq. 8. In sum, we have shown how optimal control can be associated with filtering and Bayesian posteriors exactly in the case of linear MDPs and the Bayesian posterior as an upper bound in the case of a fixed policy. These results provide a sound probabilistic and Bayesian interpretation of the SR, which has hitherto been missing in the literature, and enables us to design mathematically principled active inference agents based upon the SR.

5 Successor Representation Active Inference

Using the probabilistic interpretation of the SR and the equations of discrete state-space AIF, we can construct an AIF agent which utilizes the SR to compute value functions of actions instead of model-based planning. That is, the policy posterior path integral $q(\pi ) = \sigma (\mathcal {G}) = \sigma (\sum _t^T \mathcal {G}_t)$ can be considered as a value function and dynamically computed using the SR. The fact that in discrete AIF the generative model transition matrix B(u) is given allows us to dispense with learning the successor matrix from experience. However, to apply the SR, we need to choose which policy $\pi $ we wish to compute the value function under. This choice is important since the default policy must assign enough probability mass to all parts of the state-space to be able to provide an accurate value estimate there. Heuristically, we set the default policy to be uniform over the action space $p(u) = \frac{1}{\mathcal {A}}$ where $\mathcal {A}$ is the cardinality of the action space. This lets us define the default transition matrix,

$$\begin{aligned} \tilde{B} = \mathbb {E}_{p(u)}[B(u)] = \frac{1}{\mathcal {A}}\sum _i B[:,:,u_i] \end{aligned}$$

(10)

Given $\tilde{B}$, we can analytically calculate the SR using the infinite series result,

$$\begin{aligned} M^\pi = (I + \gamma \tilde{B} + \gamma ^2 \tilde{B}^2 \cdots ) = (I - \gamma \tilde{B})^{-1} \end{aligned}$$

(11)

This means that as long as the generative model is known, the EFE value function $q(\pi )$ can be computed exactly without any interaction with the environment by first computing $M^\pi $ as in Eq. 11 and then multiplying by the reward function which is the EFE $\mathcal {G} = M^\pi \mathcal {G}^\pi (x)$. From this EFE value function actions can be sampled from the posterior over actions as,

$$\begin{aligned} u \sim q(\pi ) = \sigma (\mathcal {G}) \end{aligned}$$

(12)

A slight complication is that while the SR is defined for MDPs, AIF typically assumes a POMDP structure with observations o that do not fully specify the hidden state but are related through the likelihood matrix A. We address this by computing observation value functions as the expected state posterior under the state posterior distribution,

$$\begin{aligned} \mathcal {V}^\pi (o) = \mathbb {E}_{q(x | o)}[\mathcal {V}^\pi (x)] = q \mathcal {M}^\pi \mathcal {G}^\pi \end{aligned}$$

(13)

where $q = [q_1, q_2 \cdots ]$ is the categorical variational posterior. The SR-AIF algorithm can thus be summarized as follows: we are given a generative model containing the A and B matrices and a set of desired states C. At initialization, the agent computes the successor matrix $M^\pi $ using the default policy with Eq. 10. For each action in a given state, SR-AIF computes the EFE value function $\mathcal {G}^\pi $ for that action and then actions are sampled from the policy posterior which is the softmax over the EFE action-value functions. In a POMDP environment exactly the same process takes place except instead of action-state we have action-observation value functions which are computed as Eq. 13.

5.1 Computational Complexity

In theory the computational complexity of SR-AIF is superior than standard AIF as standard-AIF uses model-based planning which evaluates the EFE value function $\mathcal {G}$ by exhaustively computing all possible future trajectories for different policies. This has a cost that grows exponentially in the time horizon due to the branching of possible futures. If we denote the number of actions $\mathcal {A}$, the dimension of the state-space $\mathcal {X}$ and the time-horizon T, we can approximately say that the computational complexity of standard AIF is of order $\mathcal {O}(\mathcal {X}T^2 \cdot \mathcal {A}^T)$ since the number of possible trajectories is approximately $\mathcal {A}^T$ where evaluating each step of a trajectory costs of order $\mathcal {X}$ and we must repeat this for each timestep. This is exponential in the time-horizon and renders AIF unsuitable for long term planning. Several heuristic methods have been proposed to handle this, usually by pruning obviously unsuccessful policies [12]. However, this does not remove the exponential complexity but only reduces it by a constant factor. In practice, this exponential explosion is handled by reducing the time-horizon or the policy space to be searched, which renders the evaluation of $\mathcal {G}$ approximate and makes the resulting AIF agents myopic to long term reward contingencies.

By contrast, SR-AIF analytically computes an approximation to the EFE value function $\mathcal {G}$ directly from the known transition dynamics by Eq. 11. This means that no exhaustive future simulation is required for each action but instead only a one-time cost is incurred at initialization. The main cost is the matrix inverse of approximately $\mathcal {X}^3$. Then an action must be selected which costs of order $\mathcal {A}$. This means the total complexity of SR-AIF is of order $\mathcal {O}(\mathcal {X}^3 + \mathcal {A}T)$. SR-AIF thus reduces the computational complexity of AIF from exponential to cubic and hence, in theory, allows discrete-state-space active inference to be applied to substantially larger problems than previously possible.

6 Experiments

We empirically demonstrate the superior computational complexity and ultimate performance of SR-AIF as the state-space and time-horizon grows on a series of grid-world environments. These provide a simple test-bed environment for evaluating computational complexity in practice without the confounding factors introduced by a more complex environment. The agent is initialized randomly in an $N \times N$ grid and must reach a reward located in the bottom corner of the grid. On average, as the grid size increases, both the state-space size and the planning horizon required to find this reward increase.

We implemented the AIF agent using the pymdp library for discrete state-space AIF [18]. We found that for larger grid-sizes the matrix inverse used to compute the successor matrix often became numerically unstable. Heuristically, we countered this by increasing the ‘discount rate’ $\gamma $ in Eq. 11 to be greater than 1 (we used 5 for larger grid-sizes). Otherwise $\gamma $ for SR-AIF and standard AIF was set to 0.99. For the active inference agent, to keep computation times manageable, we used an planning horizon of 7 and policy length of 7.

However, this task had no epistemic contingencies but only involved reward maximization. A key aspect of active inference though is its native handling of uncertainty through the EFE objective. Here, we demonstrate that the SR representation can adapt to uncertainty and dynamically change the balance of exploration and exploitation. To demonstrate this, we introduce an uncertainty and exploration component into the gridworld task by setting some squares of the grid to be ‘unknowable’ such that if the agent is on these squares, it is equally likely to receive an observation from any other square in the same row of the grid. This is done by setting columns of the A matrix to uniform distributions for each ‘unknowable’ square in the grid. We show that if equipped with the EFE objective function, SR-AIF is able to instantly recompute the value function based on this information in the A matrix without having to change the successor matrix M. Moreover, due to the property that the value function can be recomputed for each reward function, this allows a dynamic weighting of the utility and information gain components of the EFE to take place at runtime.

7 Discussion

In this paper, we have derived a probabilistic interpretation of the SR and related it to control as inference and linear RL. We then constructed an SR-AIF algorithm which exhibits superior significant performance and computational complexity benefits to standard AIF due to its amortization of policy selection using a successor matrix which can be computed analytically at initialization.

It is important to note that while the SR-AIF has substantially better computational complexity, this comes at the cost of a necessary approximation. The successor matrix is computed only for a fixed default policy $\pi $ and the choice of this policy can have significant effects upon the estimated value function and hence upon behaviour. The choice of the default policy is thus important to performance and was here chosen entirely on heuristic grounds. Principled ways of estimating or bootstrapping better default policies would be important for improving the performance of SR-AIF in practice. Alternatively, the MDP itself could be regularized so that it becomes linear as in [39] such that the optimal policy can be solved for directly. This approach has been applied in a neuroscience context [34] but the extension to active inference remains to be investigated.

Another point of extension is that here we have considered AIF and SR-AIF in the context of a single sensory modality and a single-factor generative model. However, many tasks modelled by AIF use multiple factors and modalities to express more complex relationships and contingencies. The extension of SR-AIF to multiple modalities and factors is straightforward algebraically, but has subtle implementation details and is left to future work.

References

Aguilera, M., Millidge, B., Tschantz, A., Buckley, C.L.: How particular is the physics of the free energy principle? Physics of Life Reviews (2021)
Google Scholar
Attias, H.: Planning by probabilistic inference. In: AISTATS. Citeseer (2003)
Google Scholar
Behrens, T.E., et al.: What is a cognitive map? organizing knowledge for flexible behavior. Neuron 100(2), 490–509 (2018)
Article Google Scholar
Buckley, C.L., Kim, C.S., McGregor, S., Seth, A.K.: The free energy principle for action and perception: a mathematical review. J. Math. Psychol. 81, 55–79 (2017)
Article MathSciNet MATH Google Scholar
Da Costa, L., Parr, T., Sajid, N., Veselic, S., Neacsu, V., Friston, K.: Active inference on discrete state-spaces: a synthesis. arXiv preprint arXiv:2001.07203 (2020)
Da Costa, L., Sajid, N., Parr, T., Friston, K., Smith, R.: The relationship between dynamic programming and active inference: the discrete, finite-horizon case. arXiv preprint arXiv:2009.08111 (2020)
Dayan, P.: Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5(4), 613–624 (1993)
Article Google Scholar
Doya, K., Ishii, S., Pouget, A., Rao, R.P.: Bayesian brain: Probabilistic approaches to neural coding. MIT press (2007)
Google Scholar
Fountas, Z., Sajid, N., Mediano, P., Friston, K.: Deep active inference agents using monte-carlo methods. Adv. Neural. Inf. Process. Syst. 33, 11662–11675 (2020)
Google Scholar
Friston, K.: A free energy principle for a particular physics. arXiv preprint arXiv:1906.10184 (2019)
Friston, K., Ao, P.: Free energy, value, and attractors. Comput. Math. Methods Med. (2012)
Google Scholar
Friston, K., Da Costa, L., Hafner, D., Hesp, C., Parr, T.: Sophisticated inference. Neural Comput. 33(3), 713–763 (2021)
Article MathSciNet MATH Google Scholar
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., Pezzulo, G.: Active inference: a process theory. Neural Comput. 29(1), 1–49 (2017)
Article MathSciNet MATH Google Scholar
Friston, K., Kilner, J., Harrison, L.: A free energy principle for the brain. J. Physiol. Paris 100(1–3), 70–87 (2006)
Article Google Scholar
Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., Pezzulo, G.: Active inference and epistemic value. Cogn. Neurosci. 6(4), 187–214 (2015)
Article Google Scholar
Friston, K., Samothrakis, S., Montague, R.: Active inference and agency: optimal control without cost functions. Biol. Cybern. 106(8–9), 523–541 (2012)
Article MathSciNet MATH Google Scholar
Friston, K.J., Daunizeau, J., Kiebel, S.J.: Reinforcement learning or active inference? PloS one 4(7) (2009)
Google Scholar
Heins, C., Millidge, B., Demekas, D., Klein, B., Friston, K., Couzin, I., Tschantz, A.: pymdp: A python library for active inference in discrete state spaces. arXiv preprint arXiv:2201.03904 (2022)
Knill, D.C., Pouget, A.: The bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27(12), 712–719 (2004)
Article Google Scholar
Levine, S.: Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909 (2018)
Machado, M.C., Bellemare, M.G., Bowling, M.: Count-based exploration with the successor representation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 5125–5133 (2020)
Google Scholar
Machado, M.C., Rosenbaum, C., Guo, X., Liu, M., Tesauro, G., Campbell, M.: Eigenoption discovery through the deep successor representation. arXiv preprint arXiv:1710.11089 (2017)
Millidge, B.: Combining active inference and hierarchical predictive coding: a tutorial introduction and case study (2019)
Google Scholar
Millidge, B.: Deep active inference as variational policy gradients. arXiv preprint arXiv:1907.03876 (2019)
Millidge, B.: Applications of the free energy principle to machine learning and neuroscience. arXiv preprint arXiv:2107.00140 (2021)
Millidge, B., Seth, A., Buckley, C.L.: A mathematical walkthrough and discussion of the free energy principle. arXiv preprint arXiv:2108.13343 (2021)
Millidge, B., Tschantz, A., Buckley, C.L.: Whence the expected free energy? Neural Comput. 33(2), 447–482 (2021)
Article MathSciNet MATH Google Scholar
Millidge, B., Tschantz, A., Seth, A., Buckley, C.: Understanding the origin of information-seeking exploration in probabilistic objectives for control. arXiv preprint arXiv:2103.06859 (2021)
Millidge, B., Tschantz, A., Seth, A.K., Buckley, C.L.: On the relationship between active inference and control as inference. arXiv preprint arXiv:2006.12964 (2020)
Momennejad, I.: Learning structures: predictive representations, replay, and generalization. Curr. Opin. Behav. Sci. 32, 155–166 (2020)
Article Google Scholar
Momennejad, I., Russek, E.M., Cheong, J.H., Botvinick, M.M., Daw, N.D., Gershman, S.J.: The successor representation in human reinforcement learning. Nat. Hum. Behav. 1(9), 680–692 (2017)
Article Google Scholar
Parr, T., Friston, K.J.: Generalised free energy and active inference. Biol. Cybern. 113(5–6), 495–513 (2019)
Article MathSciNet MATH Google Scholar
Parr, T., Markovic, D., Kiebel, S.J., Friston, K.J.: Neuronal message passing using mean-field, bethe, and marginal approximations. Sci. Rep. 9(1), 1–18 (2019)
Article Google Scholar
Piray, P., Daw, N.D.: Linear reinforcement learning in planning, grid fields, and cognitive control. Nat. Commun. 12(1), 1–20 (2021)
Article Google Scholar
Rawlik, K.C.: On probabilistic inference approaches to stochastic optimal control (2013)
Google Scholar
Smith, R., Friston, K.J., Whyte, C.J.: A step-by-step tutorial on active inference and its application to empirical data. J. Math. Psychol. 107, 102632 (2022)
Article MathSciNet MATH Google Scholar
Stachenfeld, K.L., Botvinick, M.M., Gershman, S.J.: The hippocampus as a predictive map. Nat. Neurosci. 20(11), 1643–1653 (2017)
Article Google Scholar
Todorov, E.: General duality between optimal control and estimation. In: 2008 47th IEEE Conference on Decision and Control, pp. 4286–4292. IEEE (2008)
Google Scholar
Todorov, E.: Efficient computation of optimal actions. Proc. Natl. Acad. Sci. 106(28), 11478–11483 (2009)
Article MATH Google Scholar
Toussaint, M.: Probabilistic inference as a model of planned behavior. KI 23(3), 23–29 (2009)
Google Scholar
Tschantz, A., Millidge, B., Seth, A.K., Buckley, C.L.: Control as hybrid inference. arXiv preprint arXiv:2007.05838 (2020)
Tschantz, A., Millidge, B., Seth, A.K., Buckley, C.L.: Reinforcement learning through active inference. arXiv preprint arXiv:2002.12636 (2020)
Whittington, J.C., McCaffary, D., Bakermans, J.J., Behrens, T.E.: How to build a cognitive map: insights from models of the hippocampal formation. arXiv preprint arXiv:2202.01682 (2022)
Whittington, J.C., et al.: The tolman-eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell 183(5), 1249–1263 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

VERSES Research Lab, Los Angeles, California, USA
Beren Millidge & Christopher L. Buckley
MRC Brain Network Dynamics Unit, University of Oxford, Oxford, UK
Beren Millidge
Sussex AI Group, Department of Informatics, University of Sussex, Brighton, UK
Christopher L. Buckley

Authors

Beren Millidge
View author publications
You can also search for this author in PubMed Google Scholar
Christopher L. Buckley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beren Millidge .

Editor information

Editors and Affiliations

University of Sussex, Brighton, UK
Christopher L. Buckley
University of Chieti-Pescara, Pescara, Italy
Daniela Cialfi
Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
Pablo Lanillos
Wellcome Centre for Human Neuroimaging, London, UK
Maxwell Ramstead
Wellcome Centre for Human Neuroimaging, London, UK
Noor Sajid
Kyoto University, Kyoto, Japan
Hideaki Shimazaki
Ghent University, Ghent, Belgium
Tim Verbelen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Millidge, B., Buckley, C.L. (2023). Active Inference Successor Representations. In: Buckley, C.L., et al. Active Inference. IWAI 2022. Communications in Computer and Information Science, vol 1721. Springer, Cham. https://doi.org/10.1007/978-3-031-28719-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-28719-0_11
Published: 22 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28718-3
Online ISBN: 978-3-031-28719-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Active Inference Successor Representations

Abstract

Similar content being viewed by others