Abstract
There is considerable interest in ethical designs for artificial intelligence (AI) that do not pose risks to humans. This paper proposes using elements of Hutter’s agent-environment framework to define a decision support system for simulating, visualizing and analyzing AI designs to understand their consequences. The simulations do not have to be accurate predictions of the future; rather they show the futures that an agent design predicts will fulfill its motivations and that can be explored by AI designers to find risks to humans. In order to safely create a simulation model this paper shows that the most probable finite stochastic program to explain a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions. It also discusses the risks of running an AI in a simulated environment.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Asimov, I.: Runaround. Astounding Science Fiction (1942)
Bostrom, N.: Ethical issues in advanced artificial intelligence. In: Smit, I., et al. (eds.) Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence, vol. 2, pp. 12–17. Int. Inst. of Adv. Studies in Sys. Res. and Cybernetics (2003)
Bostrom, N.: The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines (forthcoming)
Chalmers, D.: The Singularity: A Philosophical Analysis. J. Consciousness Studies 17, 7–65 (2010)
Dewey, D.: Learning What to Value. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 309–314. Springer, Heidelberg (2011)
Elliott, G.: US Nuclear Weapon Safety and Control. MIT Program in Science, Technology, and Society (2005), http://web.mit.edu/gelliott/Public/sts.072/paper.pdf
Ghahramani, Z.: Learning Dynamic Bayesian Networks. In: Giles, C.L., Gori, M. (eds.) IIASS-EMFCSC-School 1997. LNCS (LNAI), vol. 1387, pp. 168–197. Springer, Heidelberg (1998)
Goertzel, B.: Universal ethics: the foundations of compassion in pattern dynamics (2004), http://www.goertzel.org/papers/UniversalEthics.html
Hibbard, B., Santek, D.: The Vis5D system for easy interactive visualization. In: Proc. IEEE Visualization 1990, pp. 129–134 (1990)
Hibbard, B.: Super-intelligent machines. Computer Graphics 35(1), 11–13 (2001)
Hibbard, B.: The technology of mind and a new social contract. J. Evolution and Technology 17(1), 13–22 (2008)
Hibbard, B.: Temptation. Rejected for the AGI-09 Workshop on the Future of AI (2009), https://sites.google.com/site/whibbard/g/hibbard_agi09_workshop.pdf
Hibbard, B.: Model-based utility functions. J. Artificial General Intelligence 3(1), 1–24 (2012a)
Hibbard, B.: Avoiding Unintended AI Behaviors. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS (LNAI), vol. 7716, pp. 107–116. Springer, Heidelberg (2012), https://sites.google.com/site/whibbard/g/hibbard_agi12a.pdf
Hutter, M.: Universal artificial intelligence: sequential decisions based on algorithmic probability. Springer, Heidelberg (2005)
Hutter, M.: Feature reinforcement learning: Part I. Unstructured MDPs. J. Artificial General Intelligence 1, 3–24 (2009a)
Hutter, M.: Feature dynamic Bayesian networks. In: Goertzel, B., Hitzler, P., Hutter, M. (eds.) Proc. Second Conf. on AGI, AGI 2009, pp. 67–72. Atlantis Press, Amsterdam (2009b)
Kurzweil, R.: The singularity is near. Penguin, New York (2005)
Li, M., Vitanyi, P.: An introduction to Kolmogorov complexity and its applications. Springer, Heidelberg (1997)
Lloyd, S.: Computational Capacity of the Universe. Phys. Rev. Lett. 88, 237901 (2002)
Muehlhauser, L., Helm, L.: The singularity and machine ethics. In: Eden, Søraker, Moor, Steinhart (eds.) The Singularity Hypothesis: a Scientific and Philosophical Assessment. Springer, Heidleberg (2012)
Omohundro, S.: The basic AI drive. In: Wang, P., Goertzel, B., Franklin, S. (eds.) Proc. First Conf. on AGI, AGI 2008, pp. 483–492. IOS Press, Amsterdam (2008)
Puterman, M.L.: Markov Decision Processes - Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Ring, M., Orseau, L.: Delusion, Survival, and Intelligent Agents. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 11–20. Springer, Heidelberg (2011)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT Press (1998)
Waser, M.: Designing a safe motivational system for intelligent machines. In: Baum, E., Hutter, M., Kitzelmann, E. (eds.) Proc. Third Conf. on AGI, AGI 2010, pp. 170–175. Atlantis Press, Amsterdam (2010)
Waser, M.: Rational Universal Benevolence: Simpler, Safer, and Wiser Than “Friendly AI”. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS (LNAI), vol. 6830, pp. 153–162. Springer, Heidelberg (2011)
Yudkowsky, E.: (2004), http://www.sl4.org/wiki/CoherentExtrapolatedVolition
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hibbard, B. (2012). Decision Support for Safe AI Design. In: Bach, J., Goertzel, B., Iklé, M. (eds) Artificial General Intelligence. AGI 2012. Lecture Notes in Computer Science(), vol 7716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35506-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-35506-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35505-9
Online ISBN: 978-3-642-35506-6
eBook Packages: Computer ScienceComputer Science (R0)