Abstract
The dictionary matching with gaps problem is to preprocess a dictionary D of d gapped patterns P 1,…,P d over alphabet Σ, where each gapped pattern P i is a sequence of subpatterns separated by bounded sequences of don’t cares. Then, given a query text T of length n over alphabet Σ, the goal is to output all locations in T in which a pattern P i ∈ D, 1 ≤ i ≤ d, ends. There is a renewed current interest in the gapped matching problem stemming from cyber security. In this paper we solve the problem where all patterns in the dictionary have one gap with at least α and at most β don’t cares, where α and β are given parameters. Specifically, we show that the dictionary matching with a single gap problem can be solved in either O(dlogd + |D|) time and O(dlogε d + |D|) space, and query time O(n(β − α)loglogd log2 min { d, log|D| } + occ), where occ is the number of patterns found, or preprocessing time: O(d 2·ovr + |D|), where ovr is the maximal number of subpatterns including each other as a prefix or as a suffix, space: O(d 2 + |D|), and query time O(n(β − α) + occ), where occ is the number of patterns found. As far as we know, this is the best solution for this setting of the problem, where many overlaps may exist in the dictionary.
This research was supported by the Kabarnit Cyber consortium funded by the Chief Scientist in the Israeli Ministry of Economy under the Magnet Program.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Comm. ACM 18(6), 333–340 (1975)
Amir, A., Calinescu, G.: Alphabet independent and dictionary scaled matching. J. of Algorithms 36, 34–62 (2000)
Amir, A., Farach, M., Giancarlo, R., Galil, Z., Park, K.: Dynamic dictionary matching. Journal of Computer and System Sciences 49(2), 208–222 (1994)
Amir, A., Farach, M., Idury, R.M., La Poutré, J.A., Schäffer, A.A.: Improved dynamic dictionary matching. Information and Computation 119(2), 258–282 (1995)
Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Indexing and dictionary matching with one error. In: Dehne, F., Gupta, A., Sack, J.-R., Tamassia, R. (eds.) WADS 1999. LNCS, vol. 1663, pp. 181–192. Springer, Heidelberg (1999)
Bille, P., Gørtz, I.L., Vildhøj, H.W., Wind, D.K.: String matching with variable length gaps. Theoretical Computer Science (443), 25–34 (2012)
Bille, P., Thorup, M.: Faster regular expression matching. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part I. LNCS, vol. 5555, pp. 171–182. Springer, Heidelberg (2009)
Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1297–1308 (2010)
Brodal, G.S., Gasieniec, L.: Approximate dictionary queries. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 65–74. Springer, Heidelberg (1996)
Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. 36th Annual ACM Symposium on the Theory of Computing (STOC), pp. 91–100. ACM Press (2004)
Chan, T.M., Larsen, K.G., Pǎtraşcu, M.: Orthogonal range searching on the ram, revisited. In: Proc. 27th ACM Symposium on Computational Geometry (SoCG), pp. 1–10 (2011)
Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes and transposition invariance. Inf. Retr. 11(4), 338–349 (2008)
Haapasalo, T., Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Online Dictionary Matching with Variable-Length Gaps. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 76–87. Springer, Heidelberg (2011)
Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database. Nucleic Acids Res. (27), 215–219 (1999)
Krishnamurthy, M., Seagren, E.S., Alder, R., Bayles, A.W., Burke, J., Carter, S., Faskha, E.: How to Cheat at Securing Linux. Syngress Publishing, Inc., Elsevier, Inc., 30 Corporate Dr., Burlington, MA 01803 (2008), e-edition: http://www.sciencedirect.com/science/book/9781597492072
Kucherov, G., Rusinowitch, M.: Matching a set of strings with variable length don’t cares. Theoret. Comput. Sci. 178(12), 129–154 (1997)
McCreight, E.M.: A Space-Economical Suffix Tree Construction Algorithm. Journal of the ACM 23(2), 262–272 (1976)
Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Bio. 12(8), 1065–1082 (2005)
Myers, G.: A four-russian algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)
Myers, G., Mehldau, G.: A system for pattern matching applications on biosequences. CABIOS 9(3), 299–314 (1993)
Naa, J.C., Apostolico, A., Iliopoulos, C.S., Park, K.: Truncated suffix trees and their application to data compression. Theoretical Computer Science 304(3), 87–101 (2003)
Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Bio. 10(6), 903–923 (2003)
Navarro, G., Raffinot, M.: New techniques for regular expression searching. Algorithmica 41(2), 89–116 (2004)
Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
van Emde Boas, P.: Preserving order in a forest in less than logarithmic time. In: Proceedings of the 16th Annual Symposium on Foundations of Computer Science, pp. 75–84 (1975)
Verint. Packet intrusion detection. Personal communication (2013)
Weiner, P.: Linear pattern matching algorithm. In: Proc. 14 IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Zhang, M., Zhang, Y., Hu, L.: A faster algorithm for matching a set of patterns with variable length don’t cares. Inform. Process. Letters 110, 216–220 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Amir, A., Levy, A., Porat, E., Shalom, B.R. (2014). Dictionary Matching with One Gap. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds) Combinatorial Pattern Matching. CPM 2014. Lecture Notes in Computer Science, vol 8486. Springer, Cham. https://doi.org/10.1007/978-3-319-07566-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-07566-2_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07565-5
Online ISBN: 978-3-319-07566-2
eBook Packages: Computer ScienceComputer Science (R0)