Abstract
Despite indisputable advances in reinforcement learning (RL) research, some cognitive and architectural challenges still remain. The primary source of challenges in the current conception of RL stems from the theory’s way to define states. Whereas states under laboratory conditions are tractable (due to the Markov property), states in real-world RL are high-dimensional, continuous and partially observable. Hence, effective learning and generalization can be guaranteed if the subset of reward relevant dimensions were correctly identified for each state. Moreover, the computational discrepancy between model-free and model-based RL methods creates a stability-plasticity dilemma in terms of how to guide optimal decision-making control in case of interactive and competitive multiple systems, each of which implements different type of RL methods. By showing behavioral results of how human subjects flexibly define states in a reversal learning paradigm contrary to a simple RL model, we argue that these challenges can be met by infusing the RL framework as an algorithmic theory of human behavior with the strengths of the attractor framework at the level of neural implementation. Our position is supported by the hypothesis that ‘attractor states’ which are stable patterns of self-sustained and reverberating brain activity, are a manifestation of the collective dynamics of neuronal populations in the brain. With its capacity of pattern-completion along with the ability to link events in temporal order, an attractor network becomes relatively insensitive to noise allowing to account for sparse data which is characteristic to high-dimensional and continuous real-world RL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge, Massachusetts (1998)
Daw, N., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8, 1704–1711 (2005)
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9, 32–50 (2009)
van Otterlo, M., Wiering, M.: Reinforcement learning and markov decision processes. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning: State of the Art, pp. 3–42. Springer, Berlin, Heidelberg (2012)
Krigolson, O.E., Hassall, C.D., Handy, T.C.: How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans. J. Cogn. Neurosci. 26, 635–644 (2014)
Marsland, S.: Machine Learning: An Algorithmic Perspective. Chapman and Hall/CRC press, Boca Raton (2015)
Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)
Doya, K.: Reinforcement learning: computational theory and biological mechanisms. HFSP J. 1, 30–40 (2007)
Niv, Y.: Reinforcement learning in the brain. J. Math. Psychol. 53, 139–154 (2009)
Shteingart, H., Neiman, T., Loewenstein, Y.: The role of first impression in operant learning. J. Exp. Psychol. Gen. 142, 476 (2013)
Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: Model-free deep rl for model-based control (2018). arXiv preprint arXiv:1802.09081
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Knight, W.: Reinforcement learning: by experimenting, computers are figuring out how to do things that no programmer could teach them. MIT Technol. Rev. 120, 32–35 (2017)
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jakowski, W.: Vizdoom: A doom-based ai research platform for visual reinforcement learning. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8 (2016)
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 528–535, IEEE (2016)
Gershman, S.J., Daw, N.D.: Reinforcement learning and episodic memory in humans and animals: an integrative framework. Annu. Rev. Psychol. 68, 101–128 (2017)
Lieberman, M.D., Gaunt, R., Gilbert, D.T., Trope, Y.: Reflexion and reflection: a social cognitive neuroscience approach to attributional inference. Advances in Experimental Social Psychology, pp. 199–249. Elsevier, Amsterdam (2002)
Gallistel, C.R., King, A.P.: Memory and the Computational Brain, 1st edn. Wiley-Blackwell, West Sussex, United Kingdom (2009)
Kahneman, D.: Thinking, Fast and Slow. Macmillan, London (2011)
Dayan, P., Berridge, K.C.: Model-based and model-free pavlovian reward learning: revaluation, revision, and revelation. Cogn. Affect. Behav. Neurosci. 14, 473–492 (2014)
Davidson, R.J., Begley, S.: The Emotional Life of Your Brain: How Its Unique Patterns Affect the Way You Think, Feel, and Live-and How You can Change Them. Hudson Street Press, Penguin Group, New York (2012)
Phelps, E.A., Lempert, K.M., Sokol-Hessner, P.: Emotion and decision making: multiple modulatory neural circuits. Annu. Rev. Neurosci. 37, 263–287 (2014)
Dolan, R.J., Dayan, P.: Goals and habits in the brain. Neuron 80, 312–325 (2013)
Reynolds, S.J.: A neurocognitive model of the ethical decision-making process: implications for study and practice. J. Appl. Psychol. 91, 737–748 (2006)
Hamid, O.H.: A model-based Markovian context-dependent reinforcement learning approach for neurobiologically plausible transfer of experience. Int. J. Hybrid Intell. Syst. 12, 119–129 (2015)
Friedel, E., Koch, S.P., Wendt, J., Heinz, A., Deserno, L., Schlagenhauf, F.: Devaluation and sequential decisions: linking goal-directed and model-based behavior. Habits: plasticity, learning and freedom (2015)
Balleine, B.W., Delgado, M.R., Hikosaka, O.: The role of the dorsal striatum in reward and decision-making. J. Neurosci. 27, 8161–8165 (2007)
Adolphs, R.: Social cognition and the human brain. Trends Cogn. Sci. 3, 469–479 (1999)
Knutson, B., Adams, C.M., Fong, G.W., Hommer, D.: Anticipation of increasing monetary reward selectively recruits nucleus accumbens. J. Neurosci. 21, RC159–RC159 (2001)
Padmala, S., Sirbu, M., Pessoa, L.: Potential reward reduces the adverse impact of negative distractor stimuli. Soc. Cogn. Affect. Neurosci. 12, 1402–1413 (2017)
Waltz, J.A., Knowlton, B.J., Holyoak, K.J., Boone, K.B., Mishkin, F.S., de Menezes Santos, M., Thomas, C.R., Miller, B.L.: A system for relational reasoning in human prefrontal cortex. Psychol. Sci. 10, 119–125 (1999)
Bunge, S.A., Helskog, E.H., Wendelken, C.: Left, but not right, rostrolateral prefrontal cortex meets a stringent test of the relational integration hypothesis. NeuroImage 46, 338–342 (2009)
Cole, M.W., Yarkoni, T., Repovš, G., Anticevic, A., Braver, T.S.: Global connectivity of prefrontal cortex predicts cognitive control and intelligence. J. Neurosci. 32, 8988–8999 (2012)
Szczepanski, S.M., Knight, R.T.: Insights into human behavior from lesions to the prefrontal cortex. Neuron 83, 1002–1018 (2014)
Mante, V., Sussillo, D., Shenoy, K.V., Newsome, W.T.: Context-dependent computation by recurrent dynamics in prefrontal cortex. Nature 503, 78–84 (2013)
Moscovitch, M., Cabeza, R., Winocur, G., Nadel, L.: Episodic memory and beyond: the hippocampus and neocortex in transformation. Annu. Rev. Psychol. 67, 105–134 (2016)
Javadi, A.H., Emo, B., Howard, L.R., Zisch, F.E., Yu, Y., Knight, R., Silva, J.P., Spiers, H.J.: Hippocampal and prefrontal processing of network topology to simulate the future. Nat. Commun. 8, 1–11 (2017)
Marr, D., Vision, A.: A Computational Investigation into the Human Representation and Processing of Visual Information, vol. 1. Freeman and Company, WH San Francisco (1982)
Mermillod, M., Bugaiska, A., Bonin, P.: The stability-plasticity dilemma: investigating the continuum from catastrophic forgetting to age-limited learning effects. Front. Psychol. 4 (2013)
Hamid, O.H., Braun, J.: Relative importance of sensory and motor events in reinforcement learning. Percept. ECVP Abstr. 39, 48–48 (2010)
Hamid, O.H., Wendemuth, A., Braun, J.: Temporal context and conditional associative learning. BMC Neurosci. 11, 1–16 (2010)
Amit, D.J., Brunel, N., Tsodyks, M.V.: Correlations of cortical hebbian reverberations: theory versus experiment. J. Neurosci. 14, 6435–6445 (1994)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79, 2554–2558 (1982)
Braun, J., Mattia, M.: Attractors and noise: twin drivers of decisions and multistability. NeuroImage 52, 740–751 (2010). Computational Models of the Brain
Thorndike, E.L.: Animal intelligence: an experimental study of the associative processes in animals. Psychol. Rev. Monogr. Suppl. 2, i (1898)
Tolman, E.C.: Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948)
Muenzinger, K.F., Gentry, E.: Tone discrimination in white rats. J. Comp. Psychol. 12, 195–206 (1931)
Tolman, E.C.: Prediction of vicarious trial and error by means of the schematic sowbug. Psychol. Rev. 46, 318–336 (1939)
Redish, A.D.: Vicarious trial and error. Nat. Rev. Neurosci. 17, 147 (2016)
Dayan, P., Niv, Y.: Reinforcement learning: the good, the bad and the ugly. Curr. Opin. Neurobiol. 18, 185–196 (2008)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Massachusetts (1996)
van der Ree, M., Wiering, M.: Reinforcement learning in the game of othello: learning against a fixed opponent and learning from self-play. In: 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), IEEE, pp. 108–115 (2013)
Castro-González, Á., Malfaz, M., Gorostiza, J.F., Salichs, M.A.: Learning behaviors by an autonomous social robot with motivations. Cybern. Syst. 45, 568–598 (2014)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285 (1996)
Maia, T.V.: Reinforcement learning, conditioning, and the brain: successes and challenges. Cogn. Affect. Behav. Neurosci. 9, 343–64 (2009)
Hamid, O.H.: The role of temporal statistics in the transfer of experience in context-dependent reinforcement learning. In: 14th International Conference on Hybrid Intelligent Systems (HIS), IEEE, pp. 123–128 (2014)
Dayan, P.: The role of value systems in decision making. In: Engel, C., Singer, W. (eds.) Better than Conscious? Decision Making, the Human Mind, and Implications for Institutions, pp. 50–71. The MIT Press, Frankfurt, Germany (2008)
Packard, M.G., Knowlton, B.: Learning and memory functions of the basal ganglia. Ann. Rev. Neurosci. 25, 563–593 (2002)
Dayan, P., Balleine, B.W.: Reward, motivation, and reinforcement learning. Neuron 36, 285–298 (2002)
Owen, A.M.: Cognitive planning in humans: neuropsychological, neuroanatomical and neuropharmacological perspectives. Prog. Neurobiol. 53, 431–450 (1997)
Rigotti, M., Rubin, D.B.D., Morrison, S.E., Salzman, C.D., Fusi, S.: Attractor concretion as a mechanism for the formation of context representations. Neuroimage 52, 833–847 (2010)
Niv, Y., Daniel, R., Geana, A., Gershman, S.J., Leong, Y.C., Radulescu, A., Wilson, R.C.: Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35, 8145–8157 (2015)
Kamin, L.J.: Predictability, surprise, attention, and conditioning. In: Campbell, B.A., Church, R.M. (eds.) Punishment and Aversive Behavior, pp. 242–259. Appleton-Century-Crofts, New York (1969)
Reynolds, G.S.: Attention in the pigeon. J. Exp. Anal. Behav. 4, 203–208 (1961)
Rescorla, R.A., Lolordo, V.M.: Inhibition of avoidance behavior. J. Comp. Physiol. Psychol. 59, 406–412 (1968)
Kremer, E.F.: The Rescorla-Wagner model: losses in associative strength in compound conditioned stimuli. J. Exp. Psychol. Animal Behav. Proc. 4, 22–36 (1978)
Dayan, P., Abbott, L.F.: Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. The MIT Press, Cambridge (2005)
Nevo, I., Erev, I.: On surprise, change, and the effect of recent outcomes. Front. Psychol. 3 (2012)
Poldrack, R.A., Packard, M.G.: Competition among multiple memory systems: converging evidence from animal and human brain studies. Neuropsychologia 41, 245–251 (2003)
Hamid, O.H., Braun, J.: Attractor neural states: a brain-inspired complementary approach to reinforcement learning. In: Proceedings of the 9th International Joint Conference on Computational Intelligence - Volume 1: IJCCI, INSTICC, SciTePress, pp. 385–392 (2017)
Zilli, E.A., Hasselmo, M.E.: Modeling the role of working memory and episodic memory in behavioral tasks. Hippocampus 18, 193–209 (2008)
Penner, M.R., Mizumori, S.J.: Neural systems analysis of decision making during goal-directed navigation. Prog. Neurobiol. 96, 96–135 (2012)
Chumbley, J.R., Flandin, G., Bach, D.R., Daunizeau, J., Fehr, E., Dolan, R.J., Friston, K.J.: Learning and generalization under ambiguity: An fmri study. PLoS Comput. Biol. 8, 1–11 (2012)
Amit, D.J., Fusi, S., Yakovlev, V.: Paradigmatic working memory (attractor) cell in it cortex. Neural Comput. 9, 1071–1092 (1997)
Miyashita, Y., Chang, H.S.: Neuronal correlate of pictorial short-term memory in the primate temporal cortex. Nature 331, 68–70 (1988)
Miyashita, Y.: Neuronal correlate of visual associative long-term memory in the primate temporal cortex. Nature 335, 817–820 (1988)
Yakovlev, V., Fusi, S., Berman, E., Zohary, E.: Inter-trial neuronal activity in inferior temporal cortex: a putative vehicle to generate long-term visual associations. Nat. Neurosci. 1, 310–317 (1998)
Sakai, K., Miyashita, Y.: Neural organization for the long-term memory of paired associates. Nature 354, 152–155 (1991)
Sakai, K., Naya, Y., Miyashita, Y.: Neuronal tuning and associative mechanisms in form representation. Learn. Mem. 1, 83–105 (1994)
Rainer, G., Rao, S.C., Miller, E.K.: Prospective coding for objects in primate prefrontal cortex. J. Neurosci. 19, 5493–5505 (1999)
Amit, D.J.: The Hebbian paradigm reintegrated: local reverberations as internal representations. Behav. Brain Sci. 18, 617–626 (1995)
Griniasty, M., Tsodyks, M.V., Amit, D.J.: Conversion of temporal correlations between stimuli to spatial correlations between attractors. Neural Comput. 5, 1–17 (1993)
Brunel, N.: Hebbian learning of context in recurrent neural networks. Neural Comput. 8, 1677–1710 (1996)
Barbieri, F., Brunel, N.: Can attractor network models account for the statistics of firing rates during persistent activity in prefrontal cortex? Front. Neurosci. 2, 114–122 (2008)
Fusi, S., Drew, P.J., Abbott, L.F.: Cascade models of synaptically stored memories. Neuron 45, 599–611 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hamid, O.H., Braun, J. (2019). Reinforcement Learning and Attractor Neural Network Models of Associative Learning. In: Sabourin, C., Merelo, J.J., Madani, K., Warwick, K. (eds) Computational Intelligence. IJCCI 2017. Studies in Computational Intelligence, vol 829. Springer, Cham. https://doi.org/10.1007/978-3-030-16469-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-16469-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16468-3
Online ISBN: 978-3-030-16469-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)