Abstract
This paper offers a new efficient regular expression matching algorithm which follows the POSIX-type leftmost-longest rule. The algorithm basically emulates the subset construction without backtracking, so that its computational cost even in the worst case does not explode exponentially; the time complexity of the algorithm is O(mn(nā+āc)), where m is the length of a given input string, n the number of occurrences of the most frequently used letter in a given regular expression and c the number of subexpressions to be used for capturing substrings. A formalization of the leftmost-longest semantics by using parse trees is also discussed.
This work is supported by Japan Society for Promotion of Science, Basic Research (C) No.22500019.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
The Open Group Base Specification Issue 6 IEEE Std 1003.1 2004 Edition (2004), http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html
Cox, R.: Regular Expression Matching Can Be Simple and Fast (2007), http://swtch.com/~rsc/regexp/regexp1.html
Cox, R.: Regular Expression Matching in the Wild (2010), http://swtch.com/~rsc/regexp/regexp3.html
DubĆ©, D., Feeley, M.: Efficiently Building a Parse Tree from a Regular Expression. Acta InfomaticaĀ 37(2), 121ā144 (2000)
Fowler, G.: An Iterpretation of the POSIX Regex Standard (2003), http://www2.research.att.com/~gsf/testregex/re-interpretation.html
Frisch, A., Cardelli, L.: Greedy Regular Expression Matching. In: DĆaz, J., KarhumƤki, J., Lepistƶ, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol.Ā 3142, pp. 618ā629. Springer, Heidelberg (2004)
Glushkov, V.M.: The Abstract Theory of Automata. Russian Mathematical SurveysĀ 16(5), 1ā53 (1961)
Kuklewicz, C.: Regular Expressions: Bounded Space Proposal (2007), http://www.haskell.org/haskellwiki/Regular_expressions/Bounded_space_proposal
Laurikari, V.: Efficient Submatch Addressing for Regular Expressions. Masterās thesis, Helsinki University of Technology (2001)
McNaughton, R., Yamada, H.: Regular Expressions and State Graphs for Automata. IEEE Transactions on Electronic ComputersĀ 9, 39ā47 (1960)
Vansummeren, S.: Type Inference for Unique Pattern Matching. ACM Transactions on Programming Languages and SystemsĀ 28(3), 389ā428 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Okui, S., Suzuki, T. (2011). Disambiguation in Regular Expression Matching via Position Automata with Augmented Transitions. In: Domaratzki, M., Salomaa, K. (eds) Implementation and Application of Automata. CIAA 2010. Lecture Notes in Computer Science, vol 6482. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18098-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-18098-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18097-2
Online ISBN: 978-3-642-18098-9
eBook Packages: Computer ScienceComputer Science (R0)