Recognition of Agents Based on Observation of Their Sequential Behavior

Qiao, Qifeng; Beling, Peter A.

doi:10.1007/978-3-642-40988-2_3

Qifeng Qiao²³ &
Peter A. Beling²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8188))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3559 Accesses
4 Citations

Abstract

We study the use of inverse reinforcement learning (IRL) as a tool for recognition of agents on the basis of observation of their sequential decision behavior. We model the problem faced by the agents as a Markov decision process (MDP) and model the observed behavior of an agent in terms of forward planning for the MDP. The reality of the agent’s decision problem and process may not be expressed by the MDP and its policy, but we interpret the observation as optimal actions in the MDP. We use IRL to learn reward functions for the MDP and then use these reward functions as the basis for clustering or classification models. Experimental studies with GridWorld, a navigation problem, and the secretary problem, an optimal stopping problem, show algorithms’ performance in different learning scenarios for agent recognition where the agents’ underlying decision strategy may be expressed by the MDP policy or not. Empirical comparisons of our method with several existing IRL algorithms and with direct methods that use feature statistics observed in state-action space suggest it may be superior for agent recognition problems, particularly when the state space is large but the length of the observed decision trajectory is small.

Download to read the full chapter text

Chapter PDF

Estimation of the Change of Agents Behavior Strategy Using State-Action History

Inverse Reinforcement Learning Based on Behaviors of a Learning Agent

Learning High-Level Navigation Strategies via Inverse Reinforcement Learning: A Comparative Analysis

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Google Scholar
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Article Google Scholar
Babes-Vroman, M., Marivate, V., Subramanian, K., Litman, M.: Apprenticeship learning about multiple intentions. In: The 28th International Conference on Machine Learning, WA, USA (2011)
Google Scholar
Baker, C.L., Saxe, R., Tenenbaum, J.B.: Action understanding as inverse planning. Cognition 113, 329–349 (2009)
Article Google Scholar
Boularias, A., Chaib-draa, B.: Bootstrapping apprenticeship learning. In: Advances in Neural Information Processing Systems 24. MIT Press (2010)
Google Scholar
Choi, J., Kim, K.-E.: Map inference for bayesian inverse reinforcement learning. In: Advances in Neural Information Processing System, pp. 1989–1997 (2011)
Google Scholar
Cobo, L.C., Isbell Jr., C.L., Thomaz, A.L.: Automatic task decomposition and state abstraction from demonstration. In: AAMAS, pp. 483–490 (2012)
Google Scholar
Deepak, R., Eyal, A.: Bayesian inverse reinforcement learning. In: Proc. 20th International Joint Conf. on Artificial Intelligence (2007)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley (2001)
Google Scholar
Dvijotham, K., Todorov, E.: Inverse optimal control with linearly-solvable mdps. In: Proc. 27th International Conf. on Machine Learning. ACM (2010)
Google Scholar
Konidaris, G.D., Kuindersma, S.R., Grupen, R.A., Barto, A.G.: Robot learning from demonstration by constructing skill trees. International Journal of Robotics Research 31(3), 360–375 (2012)
Article Google Scholar
Neu, G., Szepesvari, C.: Apprenticeship learning using inverse reinforcement learning and gradient methods. In: Proc. Uncertainty in Artificial Intelligence (2007)
Google Scholar
Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proc. 17th International Conf. on Machine Learning, pp. 663–670. Morgan Kaufmann (2000)
Google Scholar
Paddrik, M., Hayes, R., Todd, A., Yang, S., Beling, P., Scherer, W.: An agent based model of the e-mini s&p 500: Applied to flash crash analysis. In: 2012 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, CIFEr 2012 (2012)
Google Scholar
Qiao, Q., Beling, P.A.: Inverse reinforcement learning via convex programming. In: Americon Control Conference (2011)
Google Scholar
Ramirez, M., Geffner, H.: Plan recognition as planing. In: 21st Int’l Joint Conf. on Artificial Intelligence, pp. 1778–1783 (2009)
Google Scholar
Ratliff, N.D., Bagnell, J.A., Zinkevich, M.A.: Maximum margin planning. In: Proceedings of the 23rd International Conference on Machine Learning (2006)
Google Scholar
Schunk, D., Winter, J.: The relationship between risk attitudes and heuristics in search tasks: A laboratory experiment. Journal of Economic Behavior and Organization 71, 347–360 (2009)
Article Google Scholar
Seale, D.A.: Sequential decision making with relative ranks: An experimental investigation of the ’secretary problem’. Organizational Behavior and Human Decision Process 69, 221–236 (1997)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles? a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
MathSciNet Google Scholar
Syed, U., Schapire, R.E.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems, pp. 1449–1456. MIT Press (2008)
Google Scholar
Tapia, E.M., Intille, S.S., Larson, K.: Activity recognition in the home using simple and ubiquitous sensors. In: Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS, vol. 3001, pp. 158–175. Springer, Heidelberg (2004)
Chapter Google Scholar
Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: Advanced Neural Information Process Systems, pp. 1537–1544 (2005)
Google Scholar
Yang, S., Paddrik, M., Hayes, R., Todd, A., Kirilenko, A., Beling, P., Scherer, W.: Behavior based learning in identifying high frequency trading strategies. In: 2012 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, CIFEr 2012 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Systems Engineering, University of Virginia, VA, USA
Qifeng Qiao & Peter A. Beling

Authors

Qifeng Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Peter A. Beling
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
Hendrik Blockeel
Fraunhofer IAIS, Department of Knowledge Discovery, University of Bonn, Schloss Birlinghoven, 53754, Sankt Augustin, Germany
Kristian Kersting
LIACS, Universiteit Leiden, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Siegfried Nijssen
Department of Computer Science and Engineering, Czech Technical University, Technicka 2, 16627, Prague 6, Czech Republic
Filip Železný

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, Q., Beling, P.A. (2013). Recognition of Agents Based on Observation of Their Sequential Behavior. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40988-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-40988-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40987-5
Online ISBN: 978-3-642-40988-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics