Abstract
A lot of recent work has studied strategies related to bulk loading of large data sets into multidimensional index structures. In this paper, we address the problem of bulk insertions into existing index structures with particular focus on R-trees — which are an important class of index structures used widely in commercial database systems. We propose a new technique, which as opposed to the current technique of inserting data one by one, bulk inserts entire new incoming datasets into an active R-tree. This technique, called GBI (for Generalized Bulk Insertion), partitions the new datasets into sets of clusters and outliers, constructs an R-tree (small tree) from each cluster, identifies and prepares suitable locations in the original R-tree (large tree) for insertion, and lastly performs the insertions of the small trees and the outliers into the large tree in bulk. Our experimental studies demonstrate that GBI does especially well (over 200% better than the existing technique) for randomly located data as well as for real datasets that contain few natural clusters, while also consistently outperforming the alternate technique in all other circumstances.
This work was supported in part by the University of Michigan ITS Research Center of Excellence grant (DTFH61-93-X-00017-Sub) sponsored by the U.S. Dept. of Transportation and by the Michigan Dept. of Transportation. Dr. Rundensteiner thanks IBM for the Corporate IBM partnership award, and Li Chen thanks IBM for the Corporate IBM fellowship as well as mentoring from the IBM Toronto Labora.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
L. Arge, K. Hinrichs, J. Vahrenhold, and J. Vitter. Efficient Bulk Operations on Dynamic R-trees. Workshop on Algorithm Engineering and Experimentation ALENEX 99, pages 92–103, 1999.
M. R. Anderberg. Probability and Mathematical Statistics. Academic Press, New York, San Francisco, London, 1973.
C.H. Ang and T.C. Tan. New Linear Node Splitting Algorithm for R-trees. Advances in Spatial Databases, pages 339–349, 1997.
N. Beckmann, H.P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. Proceedings of SIGMOD, pages 322–331, 1990.
J. Bercken, P. Widmayer, and B. Seeger. A Generic Approach to Bulk Loading Multidimensional Index Structures. International Conference on Very Large Data Bases, pages 406–415, 1997.
L. Chen, R. Choubey, and E. A. Rundensteiner. Bulk Insertions into R-trees using the Small-Tree-Large-Tree Approach. Proceedings of ACM GIS Workshop, pages 161–162, 1998.
L. Chen, R. Choubey, and E.A. Rundensteiner. Bulk Insertions into R-Trees. WPI, Tech. Rep. CS-WPI-98-05, February 1998.
R. Choubey, L. Chen, and E. A. Rundensteiner. GBI: A Generalized R-Tree Bulk Insertion Strategy. WPI Technical Report TR-98-15-STLT, 1998.
W. Chen. Programming with Logical Queries, Bulk Updates, and Hypothetical Reasoning. IEEE Transactions on Knowledge and Data Engineering, pages 587–599, July 1997.
R. Choubey. R-Tree Bulk Insertion Strategies. Master Thesis in progress, Worcester Polytechnic Institute, 1999.
P. Ciaccia and M. Patella. Bulk Loading the M-tree. Proceedings of the Australasian Database Conference, February 1998.
B. Duran and P. Odell.Cluster Analysis-A Survey. Springer-Verlag, Berlin, Heidelberg, New York, 1974.
A. Guttman. R-trees: A Dynamic Index Structure for Spatial Searching. Proceedings of SIGMOD, pages 47–57, 1984.
Y.W. Huang, N. Jing, and E.A. Rundensteinder. Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations. International Conference on Very Large Data Bases, pages 396–405, 1997.
Y.W. Huang, M. Jones, and E.A. Rundensteiner. Symbolic Intersect Detection: A Method for Improving Spatial Intersect Joins. Proc. of the International Symposium on Spatial Databases, pages 165–177, 1997.
Y.W. Huang, N Jing, and E.A. Rundensteiner. A cost model for estimating the performance of spatial joins using R-tree. International Working Conference on Scientific and Statistical Database Management, pages 30–38, August 1997.
Informix Corporation (“http://www.informix.com”). Informix. 108
I. Kamel and C. Faloutsos. On Packing R-trees. Proceedings of International Conference on Information and Knowledge Management, pages 490–499, November 1993.
S. Leutenegger, M. Lopez, and J. Edgigton. STR: A Simple and Efficient Algorithm for R-tree Packing. Proceedings of IEEE International Conference on Data Engineering, pages 497–506, 1997.
S. Leutenegger, M. Lopez, and Y. Garcia. A Greedy Algorithm for Bulk Loading R-trees. Technical report, University of Denver Computer Science Technical Report #97-02, 1997.
S. Leutenegger and D. Nicol. Efficient Bulk-Loading of Gridles. IEEE Transactions on Knowledge and Data Engineering, pages 410–420, May 1997.
J. Li, D. Rotem, and J. Srivastave. Algorithms for Loading Parallel Gridles. Proceedings of SIGMOD, pages 347–356, 1993.
MapInfo Corporation. SpatialWare-http://www.mapinfo.com/spatialware/spatial20.html.
A. Moitra. Spatio-Temporal Data Management Using R-trees. International Journal of Geographic Information Systems, 1993.
H.Charles Romesburg. Cluster Analysis for Researchers. Lifetime Learning Publications, Belmont, California, 1984.
N. Roussopoulos, M. Roussopoulos, and Y. Kotidis. Cubetree: Organization of and Bulk Incremental Updates on the Data Cube. Proceedings of SIGMOD, pages 89–99, 1997.
H. Spath. Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Ellis Horwook Publishers, Chichester, 1982.
Y. Theodoridis and T. Sellis. Optimization Issues in R-tree Construction (extended abstract). Lecture Notes in Computer Science, pages 270–273, 1994.
J. H. Ward. Hierarchical Grouping Analysis For Applications. Journal of American Statistics Association, 58:236–244, 1963.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Choubey, R., Chen, L., Rundensteiner, E.A. (1999). GBI: A Generalized R-Tree Bulk-Insertion Strategy. In: Güting, R.H., Papadias, D., Lochovsky, F. (eds) Advances in Spatial Databases. SSD 1999. Lecture Notes in Computer Science, vol 1651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48482-5_8
Download citation
DOI: https://doi.org/10.1007/3-540-48482-5_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66247-1
Online ISBN: 978-3-540-48482-0
eBook Packages: Springer Book Archive