$$L^*$$ -based learning of Markov decision processes (extended version)

Tappler, Martin; Aichernig, Bernhard K.; Bacci, Giovanni; Eichlseder, Maria; Larsen, Kim G.

doi:10.1007/s00165-021-00536-5

$L^*$-based learning of Markov decision processes (extended version)

Original Article
Open access
Published: 31 March 2021

Volume 33, pages 575–615, (2021)
Cite this article

Download PDF

You have full access to this open access article

Formal Aspects of Computing

$L^*$-based learning of Markov decision processes (extended version)

Download PDF

Martin Tappler^1,2,
Bernhard K. Aichernig ORCID: orcid.org/0000-0002-3484-5584¹,
Giovanni Bacci⁴,
Maria Eichlseder³ &
…
Kim G. Larsen⁴

659 Accesses
11 Citations
Explore all metrics

Abstract

Automata learning techniques automatically generate systemmodels fromtest observations. Typically, these techniques fall into two categories: passive and active. On the one hand, passive learning assumes no interaction with the system under learning and uses a predetermined training set, e.g., system logs. On the other hand, active learning techniques collect training data by actively querying the system under learning, allowing one to steer the discovery ofmeaningful information about the systemunder learning leading to effective learning strategies. A notable example of active learning technique for regular languages is Angluin’s $L^*$-algorithm. The $L^*$-algorithm describes the strategy of a student who learns the minimal deterministic finite automaton of an unknown regular language $L$ by asking a succinct number of queries to a teacher who knows $L$.

In this work, we study $L^*$ -based learning of deterministic Markov decision processes, a class of Markov decision processes where an observation following an action uniquely determines a successor state. For this purpose, we first assume an ideal setting with a teacher who provides perfect information to the student. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling execution traces of the system via testing.

Experiments performed on an implementation of our sampling-based algorithm suggest that our method achieves better accuracy than state-of-the-art passive learning techniques using the same amount of test obser vations. In contrast to existing learning algorithms which assume a predefined number of states, our algorithm learns the complete model structure including the state space.

Article PDF

$$L^*$$ -Based Learning of Markov Decision Processes

Learning deterministic probabilistic automata from a model checking perspective

Article 18 May 2016

Active Automata Learning as Black-Box Search and Lazy Partition Refinement

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aspnes, J., Herlihy, M.: Fast randomized consensus using shared memory. J Algorithms 11(3), 441–461 (1990)
Article MathSciNet Google Scholar
Aichernig, B.K., Mostowski, W., Mousavi, M.R., Tappler, M., Taromirad, M.: Model learning and model-based testing. In: Bennaceur, A., Hähnle, R., Meinke, K. (eds.) Machine learning for dynamic software analysis: potentials and limits–international Dagstuhl seminar 16172, Dagstuhl Castle, Germany, April 24–27, 2016, revised papers. lecture notes in computer science, vol. 11026, pp. 74–100. Springer (2018)
Google Scholar
Angluin, D.: Learning regular sets from queries and counterexamples. Inf Comput 75(2), 87–106 (1987)
Article MathSciNet Google Scholar
Aichernig BK, Tappler M (2017) Learning from faults: mutation testing in active automata learning. In: Barrett C, Davies M, Kahsai T (eds) NASA formal methods—9th international symposium, NFM 2017, Moffett Field, CA, USA, May 16–18, 2017, proceedings, volume 10227 of lecture notes in computer science, pp 19–34
Aichernig, B.K., Tappler, M.: Probabilistic black-box reachability checking. In: Lahiri, S.K., Reger, G. (eds.) Runtime verification–17th international conference, RV 2017, Seattle, WA, USA, September 13–16, 2017, proceedings. lecture notes in computer science, pp 50–67, vol. 10548. Springer (2017)
Google Scholar
Aichernig Bernhard, K., Martin, T.: Efficient active automata learning via mutation testing. J Autom Reason 63(4), 1103–1134 (2019)
Article MathSciNet Google Scholar
Aichernig BK, Tappler M (2019) Probabilistic black-box reachability checking (extended version). Form Methods Syst Des
Bacci G, Bacci G, Guldstrand LK, Mardare R. MDPDist library. http://people.cs.aau.dk/~giovbacci/tools/bisimdist.zip. Accessed on 04 Nov 2019
Bacci G, Bacci G, Larsen KG, Mardare R (2013) The BisimDist library: efficient computation of bisimilarity distances for Markovian models. In: Joshi KR, Siegle M, Stoelinga M, D'Argenio PR (eds) Quantitative evaluation of systems—10th international conference, QEST 2013, Buenos Aires, Argentina, August 27–30, 2013. Proceedings, volume 8054 of lecture notes in computer science. Springer, pp 278–281
Bacci G, Bacci G, Larsen KG, Mardare R (2013) Computing behavioral distances, compositionally. In: Chatterjee K, Sgall J (eds) Mathematical foundations of computer science 2013—38th international symposium, MFCS 2013, Klosterneuburg, Austria, August 26–30, 2013. Proceedings, volume 8087 of lecture notes in computer science. Springer, pp 74–85
Baier, C., Katoen, J.P.: Principles of model checking. MIT Press, Cambridge (2008)
MATH Google Scholar
Bergadano, F., Varricchio, S.: Learning behaviors of automata from multiplicity and equivalence queries. SIAM J Comput 25(6), 1268–1280 (1996)
Article MathSciNet Google Scholar
Castro, J., Gavalda, R.: Learning probability distributions generated by finite-state machines. In: Heinz, J., Sempere, J.M. (eds.) Topics in grammatical inference, pp. 113–142. Springer, Berlin (2016)
Chapter Google Scholar
Cassel, S., Howar, F., Jonsson, B., Steffen, B.: Active learning for extended finite state machines. Formal Aspects Comput 28(2), 233–263 (2016)
Article MathSciNet Google Scholar
Chow, T.S.: Testing software design modeled by finite-state machines. IEEE Trans Software Eng 4(3), 178–187 (1978)
Article Google Scholar
Chen Y, Nielsen TD (2012) Active learning of Markov decision processes for system verification. In: 11th international conference on machine learning and applications, ICMLA, Boca Raton, FL, USA, December 12–15, 2012. Volume 2. IEEE, pp 289–294
Carrasco RC, Oncina J (1994) Learning stochastic regular grammars by means of a state merging method. In: Carrasco RC, Oncina J (eds) Grammatical inference and applications, second international colloquium, ICGI-94, Alicante, Spain, September 21–23, 1994. Proceedings, volume 862 of lecture notes in computer science. Springer, pp 139–152
Carrasco, R.C., Oncina, J.: Learning deterministic regular grammars from stochastic samples in polynomial time. Theor Inf Appl 33(1), 1–20 (1999)
Article MathSciNet Google Scholar
de la Higuera, C.: Grammatical inference: learning automata and grammars. Cambridge University Press, New York (2010)
Book Google Scholar
Feng L, Han T, Kwiatkowska MZ, Parker D (2011) Learning-based compositional verification for synchronous probabilistic systems. In: Bultan T, Hsiung P-A (eds) Automated technology for verification and analysis, 9th international symposium, ATVA 2011, Taipei, Taiwan, October 11–14, 2011. Proceedings, volume 6996 of lecture notes in computer science. Springer, pp 511–521
Forejt V, Kwiatkowska MZ, Norman G, Parker D (2011) Automated verification techniques for probabilistic systems. In: Bernardo M, Issarny V (eds) Formal methods for eternal networked software systems—11th international school on formal methods for the design of computer, communication and software systems, SFM 2011, Bertinoro, Italy, June 13–18, 2011. Advanced lectures, volume 6659 of lecture notes in computer science. Springer, pp 53–113
C Ghezzi et al (2014) Mining behavior models from user-intensive web applications. In: Jalote P, Briand LC, van der Hoek A (eds) 36th international conference on software engineering, ICSE'14, Hyderabad, India—May 31–June 07, 2014. ACM, pp 277–287
Hungar H, Niese O, Steffen B (2003) Domain-specific optimization in automata learning. In: Hunt Jr. WA, Somenzi F (eds) Computer aided verification, 15th international conference, CAV 2003, Boulder, CO, USA, July 8–12, 2003. Proceedings, volume 2725 of lecture notes in computer science. Springer, pp 315–327
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58(301), 13–30 (1963)
Article MathSciNet Google Scholar
Howar, F., Steffen, B.: Active automata learning in practice–an annotated bibliography of the years 2011 to 2016. In: Bennaceur, A., Hähnle, R., Meinke, K. (eds.) Machine learning for dynamic software analysis: potentials and limits–international Dagstuhl seminar 16172, Dagstuhl Castle, Germany, April 24–27, 2016, revised papers. lecture notes in computer science, vol. 11026, pp. 123–148. Springer (2018)
Google Scholar
Isberner M, Howar F, Steffen B (2014) The TTT algorithm: a redundancy-free approach to active automata learning. In: Bonakdarpour B, Smolka SA (eds) Runtime verification—5th international conference, RV 2014, Toronto, ON, Canada, September 22–25, 2014. Proceedings, volume 8734 of lecture notes in computer science. Springer, pp 307–322
Kearns, M.J.: Efficient noise-tolerant learning from statistical queries. J ACM 45(6), 983–1006 (1998)
Article MathSciNet Google Scholar
Kwiatkowska Marta, Z., Gethin, N., David, P.: Analysis of a gossip protocol in PRISM. SIGMETRICS Perform Eval Rev 36(3), 17–22 (2008)
Article Google Scholar
Kwiatkowska MZ, Norman G, Parker D (2011) PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan G, Qadeer S (eds) Computer aided verification—23rd international conference, CAV 2011, Snowbird, UT, USA, July 14–20, 2011. Proceedings, volume 6806 of lecture notes in computer science. Springer, pp. 585–591
Kwiatkowska MZ, Parker D (2013) Automated verification and strategy synthesis for probabilistic systems. In: Van Hung D, Ogawa M (eds) Automated technology for verification and analysis—11th international symposium, ATVA 2013, Hanoi, Vietnam, October 15–18, 2013. Proceedings, volume 8172 of lecture notes in computer science. Springer, pp 5–22
Komuravelli A, Pasareanu CS, Clarke EM (2012) Learning probabilistic systems from tree samples. In: Proceedings of the 27th annual IEEE symposium on logic in computer science, LICS 2012, Dubrovnik, Croatia, June 25–28, 2012. IEEE Computer Society, pp 441–450
Khalili A, Tacchella A (2014) Learning nondeterministic Mealy machines. In: Clark A, Kanazawa M, Yoshinaka R (eds) Proceedings of the 12th international conference on grammatical inference, ICGI 2014, Kyoto, Japan, September 17–19, 2014, volume 34 of JMLR workshop and conference proceedings. JMLR.org, pp 109–123
Mao H, Chen Y, Jaeger M, Nielsen TD, Larsen KG, Nielsen B (2011) Learning probabilistic automata for model checking. In: Eighth international conference on quantitative evaluation of systems, QEST 2011, Aachen, Germany, 5–8 September, 2011. IEEE Computer Society, pp 111–120
Mao H, Chen Y, Jaeger M, Nielsen TD, Larsen KG, Nielsen B (2012) Learning Markov decision processes for model checking. In: Fahrenberg U, Legay A, Thrane CR (eds) Proceedings quantities in formal methods, QFM 2012, Paris, France, 28 August 2012., volume 103 of EPTCS, pp 49–63
Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G.: Learning deterministic probabilistic automata from a model checking perspective. Mach Learn 105(2), 255–299 (2016)
Article MathSciNet Google Scholar
Margaria T, Niese O, Raffelt H, Steffen B (2004) Efficient test-based model generation for legacy reactive systems. In: Ninth IEEE international high-level design validation and test workshop 2004, Sonoma Valley, CA, USA, November 10–12, 2004. IEEE Computer Society, pp 95–100
Nerode, A.: Linear automaton transformations. Proc Am Math Soc 9(4), 541–544 (1958)
Article MathSciNet Google Scholar
Nouri A, Raman B, Bozga M, Legay A, Bensalem S (2014) Faster statistical model checking by means of abstraction and learning. In: Bonakdarpour B, Smolka SA (eds) runtime verification—5th international conference, RV 2014, Toronto, ON, Canada, September 22–25, 2014. Proceedings, volume 8734 of lecture notes in computer science. Springer, pp 340–355
Norman, G., Shmatikov, V.: Analysis of probabilistic contract signing. J Comput Secur 14(6), 561–589 (2006)
Article Google Scholar
Pferscher A, Aichernig BK (2020) Learning abstracted non-deterministic finite state machines. In: Casola V, De Benedictis A, Rak M (eds) Testing Software and Systems—32nd IFIP WG 6.1 international conference, ICTSS 2020, Naples, Italy, December 9–11, 2020. Proceedings, volume 12543 of lecture notes in computer science. Springer, pp 52–69
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley series in probability and statistics. Wiley
Rivest, R.L., Schapire, R.E.: Inference of finite automata using homing sequences. Inf Comput 103(2), 299–347 (1993)
Article MathSciNet Google Scholar
Sokolova, A., de Vink, E.P.: Probabilistic automata: system types, parallel composition and comparison. In: Baier, C., Haverkort, B.R., Hermanns, H., Katoen, J.-P., Siegle, M. (eds.) Validation of stochastic systems–a guide to current research. lecture notes in computer science, vol. 2925, pp. 1–43. Springer (2004)
Chapter Google Scholar
Shahbaz M, Groz R (2009) Inferring mealy machines. In: Cavalcanti A, Dams D (eds) FM 2009: formal methods, second world congress, Eindhoven, The Netherlands, November 2–6, 2009. Proceedings, volume 5850 of lecture notes in computer science. Springer, pp 207–222
Steffen B, Howar F, Merten M (2011) Introduction to active automata learning from a practical perspective. In: Bernardo M, Issarny V (eds) Formal methods for eternal networked software systems—11th international school on formal methods for the design of computer, communication and software systems, SFM 2011, Bertinoro, Italy, June 13–18, 2011. Advanced lectures, volume 6659 of lecture notes in computer science. Springer, pp 256–296
Segala, R., Lynch, N.: Probabilistic simulations for probabilistic processes. Nord J Comput 2(2), 250–273 (1995)
MathSciNet MATH Google Scholar
Stoelinga, M.: An introduction to probabilistic automata. Bull. EATCS 78, 176–198 (2002)
MathSciNet MATH Google Scholar
Tappler M, Aichernig BK, Bloem R (2017) Model-based testing IoT communication via active automata learning. In: 2017 IEEE international conference on software testing, verification and validation, ICST 2017, Tokyo, Japan, March 13–17, 2017. IEEE Computer Society, pp 276–287
Tappler M, Aichernig BK, Bacci G, Eichlseder M, Larsen KG (2019) ${L}^*$-based learning of Markov decision processes. In: ter Beek MH, McIver A, Oliveira JN (eds) Formal methods—the next 30 years—third world congress, FM 2019, Porto, Portugal, October 7–11, 2019. Proceedings, volume 11800 of lecture notes in computer science. Springer, pp 651–669
Tappler M (2019) Learning-based testing in networked environments in the presence of timed and stochastic behaviour. PhD thesis, Graz University of Technology
Tappler M (2020) Evaluation material for ${L}^*$-based learning of Markov decision processes. https://doi.org/10.6084/m9.figshare.7960928.v2. Accessed on 06 Mar 2020, updated for extended version
Tretmans, J.: Test generation with inputs, outputs and repetitive quiescence. Softw Concepts Tools 17(3), 103–120 (1996)
MATH Google Scholar
Tretmans, J.: Model based testing with labelled transition systems. In: Hierons, R.M., Bowen, J.P., Harman, M. (eds.) Formal methods and testing, an outcome of the FORTEST network, revised selected papers. lecture notes in computer science, vol. 4949, pp. 1–38. Springer (2008)
Chapter Google Scholar
Wen-Guey, T.: Learning probabilistic automata and Markov chains via queries. Mach Learn 8, 151–166 (1992)
MATH Google Scholar
Vaandrager Frits, W.: Model learning. Commun ACM 60(2), 86–95 (2017)
Article Google Scholar
Valiant, L.G.: A theory of the learnable. Commun ACM 27(11), 1134–1142 (1984)
Article Google Scholar
Volpato M, Tretmans J (2015) Approximate active learning of nondeterministic input output transition systems. ECEASST, 72
Willemse, T.A.C.: Heuristics for ioco-based test-based modelling. In: Brim, L., Haverkort, B.R., Leucker, M., van de Pol, J. (eds.) Formal methods: applications and technology, 11th international workshop, FMICS 2006 and 5th international workshop pdmc 2006, Bonn, Germany, August 26–27, and August 31, 2006, revised selected papers. lecture notes in computer science, vol. 4346, pp. 132–147. Springer (2006)
Google Scholar
Wang J, Sun J, Qin S (2016) Verifying complex systems probabilistically through learning, abstraction and refinement. CoRR, abs/1610.06371

Download references

Acknowledgements

The work of B.Aichernig, M. Eichlseder and M. Tappler has been carried out as part of the TU Graz LEAD project "Dependable Internet of Things inAdverse Environments". The work of K. Larsen andG. Bacci has been supported by the Advanced ERC Grant no. 867096 (LASSO).

Funding

Open access funding provided by Graz University of Technology.

Author information

Authors and Affiliations

Institute of Software Technology, Graz University of Technology, Graz, Austria
Martin Tappler & Bernhard K. Aichernig
Schaffhausen Institute of Technology, Schaffhausen, Switzerland
Martin Tappler
Institute of Applied Information Processing and Communications, Graz University of Technology, Graz, Austria
Maria Eichlseder
Department of Computer Science, Aalborg University, Aalborg, Denmark
Giovanni Bacci & Kim G. Larsen

Authors

Martin Tappler
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard K. Aichernig
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Bacci
View author publications
You can also search for this author in PubMed Google Scholar
Maria Eichlseder
View author publications
You can also search for this author in PubMed Google Scholar
Kim G. Larsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bernhard K. Aichernig.

Additional information

Annabelle McIver, Maurice ter Beek and Cliff Jones

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tappler, M., Aichernig, B.K., Bacci, G. et al. $L^*$-based learning of Markov decision processes (extended version). Form Asp Comp 33, 575–615 (2021). https://doi.org/10.1007/s00165-021-00536-5

Download citation

Received: 17 March 2020
Revised: 10 November 2020
Accepted: 30 January 2021
Published: 31 March 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00165-021-00536-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

\(L^*\)-based learning of Markov decision processes (extended version)

Abstract

Article PDF

Similar content being viewed by others

$$L^*$$ -Based Learning of Markov Decision Processes

Learning deterministic probabilistic automata from a model checking perspective

Active Automata Learning as Black-Box Search and Lazy Partition Refinement

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

\(L^*\)-based learning of Markov decision processes (extended version)

Abstract

Article PDF

Similar content being viewed by others

$$L^*$$ -Based Learning of Markov Decision Processes

Learning deterministic probabilistic automata from a model checking perspective

Active Automata Learning as Black-Box Search and Lazy Partition Refinement

Explore related subjects

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation