Abstract
In many recent applications of database management systems data may be stored in user defined complex data types (such as sequences). However, efficient querying of such data is not supported by commercially available database management systems and therefore efficient indexing schemes for complex data types need to be developed. In this paper we focus primarily on the indexing of non-timestamped sequences of sets of categorical data, specifically indexing for set subsequence queries. We address both: logical structure and implementation issues of such indexes. Our main contributions are threefold. First, we specify the logical structure of the index and we propose algorithms for set subsequence query execution, which utilize the index structure. Second, we provide the proposition for the implementation of such index, which uses means available in all of the “of the shelf” database management systems. Finally, we experimentally evaluate the performance of the index.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, pp. 69–84. Springer, Heidelberg (1993)
Andrzejewski, W., Królikowski, Z., Morzy, M.: Performance evaluation of hierarchical bitmap index supporting processing of queries on setvalued attributes (polish). Archiwum Informatyki Teoretycznej i Stosowanej 17(4), 273–288 (2005)
Andrzejewski, W., Morzy, T., Morzy, M.: Indexing of sequences of sets for efficient exact and similar subsequence matching. In: Proceedings of the 20th International Symposium on Computer and Information Sciences, pp. 864–873. Springer, Heidelberg (2005)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: Proceedings of the 1994 ACM SIGMOD international conference on Management of data, pp. 419–429. ACM Press, New York (1994)
Helmer, S., Moerkotte, G.: A study of four index structures for set-valued attributes of low cardinality (1999)
Keogh, E., Lonardi, S., Ratanamahatana, C.A.: Towards parameter-free data mining. In: KDD 2004: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 206–215. ACM Press, New York (2004)
Lerner, A., Shasha, D.: Aquery: Query language for ordered data, optimization techniques, and experiments. In: VLDB, pp. 345–356 (2003)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Doklady Akademia Nauk SSSR 163(4), 845–848 (1965)
Mamoulis, N., Yiu, M.L.: Non-contiguous sequence pattern queries. In: Proceedings of the 9th International Conference on Extending Database Technology (2004)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. In: Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms, pp. 319–327. Society for Industrial and Applied Mathematics, Philadelphia (1990)
Nanopoulos, A., Manolopoulos, Y., Zakrzewicz, M., Morzy, T.: Indexing web access-logs for pattern queries. In: WIDM 2002: Proceedings of the 4th international workshop on Web information and data management, pp. 63–68. ACM Press, New York (2002)
Sadri, R., Zaniolo, C., Zarkesh, A.M., Adibi, J.: Optimization of sequence queries in database systems. In: Symposium on Principles of Database Systems (2001)
Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., Keogh, E.: Indexing multi-dimensional time-series with support for multiple distance measures. In: ACM KDD (2003)
Wang, H., Perng, C.-S., Fan, W., Park, S., Yu, P.S.: Indexing weighted-sequences in large databases. In: Proceedings of International Conference on Data Engineering (2003)
Weiner, P.: Linear pattern matching algorithms. In: Proceedings 14th IEEE Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Yi, B.-K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 385–394. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Andrzejewski, W., Morzy, T. (2006). AISS: An Index for Non-timestamped Set Subsequence Queries. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_48
Download citation
DOI: https://doi.org/10.1007/11823728_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37736-8
Online ISBN: 978-3-540-37737-5
eBook Packages: Computer ScienceComputer Science (R0)