Abstract
In Indexing with Gaps one seeks to index a text to allow pattern queries that allow gaps within the pattern query. Formally a gapped-pattern over alphabet Σ is a pattern of the form p = p 1 g 1 p 2 g 2 ⋯ g ℓ p ℓ + 1, where ∀ i, p i ∈ Σ* and each g i is a gap length ∈ N. Often one considers these patterns with some bound constraints, for example, all gaps are bounded by a gap-bound G.
Near-optimal solutions have, lately, been proposed for the case of one gap only with a predetermined size. More specifically, an indexing solution for patterns of the form p 1·g·p 2, where g is known apriori. In this case the solutions mentioned are preprocessed in O(n logε n) time and O(n) space, where the pattern queries are answered in O(|p 1| + |p 2|), for constant sized alphabets. For the more general case when there is a bound G these results can be easily adapted with a multiplicative factor of O(G) for the preprocessing, i.e. O(n logε nG) preprocessing time and O(nG) preprocessing space. Alas, these solutions do not lend to more than one gap.
In this paper we propose a solution for k gaps one with preprocessing time O(nG 2k logk n loglogn) and space of O(nG 2k logk n) and query time O(m + 2k loglogn), where m = ∑ i = 1 |p i |.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Amir, A., Keselman, D., Landau, G., Lewenstein, N., Lewenstein, M., Rodeh, M.: Text indexing and dictionary matching with one error. J. of Algorithms 37(2), 309–325 (2000)
Amir, A., Landau, G., Lewenstein, M., Sokol, D.: Dynamic pattern, static text matching. ACM Transactions on Algorithms 3(2) (2007)
Bille, P., Gørtz, I.L.: Substring range reporting. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 299–308. Springer, Heidelberg (to apppear, 2011)
Bille, P., Li Gørtz, I., Vildhøj, H.W., Wind, D.K.: String matching with variable length gaps. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 385–394. Springer, Heidelberg (2010)
Clifford, P., Clifford, R.: Self-normalised distance with don’t cares. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 63–70. Springer, Heidelberg (2007)
Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the Symposium On Theory of Computing (STOC), pp. 91–100 (2004)
Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proceedings of the Symposium On Theory of Computing (STOC), pp. 592–601 (2002)
Crochemore, M., Iliopoulos, C., Makris, C., Rytter, W., Tsakalidis, A., Tsichlas, K.: Approximate string matching with gaps. Nordic J. of Computing 9(1), 54–65 (2002)
Ferragina, P., Muthukrishnan, S., de Berg, M.: Multi-method dispatching: A geometric approach with applications to string matching problems. In: Proceedings of the Symposium on Theory of Computing (STOC), pp. 483–491 (1999)
Fischer, M., Paterson, M.: String matching and other products. In: Karp, R.M. (ed.) Complexity of Computation, SIAM-AMS Proceedings, vol. 7, pp. 113–125 (1974)
Harel, D., Tarjan, R.: Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing 13(2), 338–355 (1984)
Iliopoulos, C., Rahman, M.: Indexing factors with gaps. Algorithmica 55(1), 60–70 (2008)
Indyk, P.: Faster algorithms for string matching problems: Matching the convolution bound. In: Proceedings of the Symposium on Foundations of Computer Science (FOCS), pp. 166–173 (1998)
Kalai, A.: Efficient pattern matching with don’t cares. In: Proceedings of the Symposium on Discrete Algorithms (SODA), pp. 655–656 (2002)
Lam, T.-W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Space efficient indexes for string matching with don’t cares. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 846–857. Springer, Heidelberg (2007)
Farach-Colton, S.M.M., Ferragina, P.: On the sorting-complexity of suffix tree construction. J. ACM 47(1), 987–1011 (2000)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)
Peterlongo, M.S.P., Allali, J.: Indexing gapped-factors using a tree. Int. J. Found. Comput. Sci. 19(1), 71–87 (2008)
Schieber, B., Vishkin, U.: On finding lowest common ancestors: simplifications and parallelization. SIAM Journal on Computing 17(6), 1253–1262 (1988)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Weiner, P.: Linear pattern matching algorithms. In: 14th Annual Symposium on Switching and Automata Theory, pp. 1–11. IEEE, Los Alamitos (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lewenstein, M. (2011). Indexing with Gaps. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds) String Processing and Information Retrieval. SPIRE 2011. Lecture Notes in Computer Science, vol 7024. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24583-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-24583-1_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24582-4
Online ISBN: 978-3-642-24583-1
eBook Packages: Computer ScienceComputer Science (R0)