Efficiently extendible mappings for balanced data distribution

Choy, D. M.; Fagin, R.; Stockmeyer, L.

doi:10.1007/BF01940647

Efficiently extendible mappings for balanced data distribution

Published: August 1996

Volume 16, pages 215–232, (1996)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Algorithmica Aims and scope Submit manuscript

Efficiently extendible mappings for balanced data distribution

Download PDF

D. M. Choy¹,
R. Fagin¹ &
L. Stockmeyer¹

77 Accesses
8 Citations
6 Altmetric
Explore all metrics

Abstract

In data storage applications, a large collection of consecutively numbered data “buckets” are often mapped to a relatively small collection of consecutively numbered storage “bins.” For example, in parallel database applications, buckets correspond to hash buckets of data and bins correspond to database nodes. In disk array applications, buckets correspond to logical tracks and bins correspond to physical disks in an array. Measures of the “goodness” of a mapping method include:

(1)
Thetime (number of operations) needed to compute the mapping.
(2)
Thestorage needed to store a representation of the mapping.
(3)
Thebalance of the mapping, i.e., the extent to which all bins receive the same number of buckets.
(4)
The cost ofrelocation, that is, the number of buckets that must be relocated to a new bin if a new mapping is needed due to an expansion of the number of bins or the number of buckets.

One contribution of this paper is to give a new mapping method, theInterval-Round-Robin (IRR) method. The IRR method has optimal balance and relocation cost, and its time complexity and storage requirements compare favorably with known methods. Specifically, ifm is the number of times that the number of bins and/or buckets has increased, then the time complexity isO(logm) and the storage isO(m ²). Another contribution of the paper is to identify the concept of ahistory-independent mapping, meaning informally that the mapping does not “remember” the past history of expansions to the number of buckets and bins, but only the current number of buckets and bins. Thus, such mappings require very little information to be stored. Assuming that balance and relocation are optimal, we prove that history-independent mappings are possible if the number of buckets is fixed (so only the number of bins can increase), but not possible if the number of bins and buckets can both increase.

References

G. M. Adel'son-Vel'skii and E. M. Landis, An algorithm for the organization of information,Dokl. Akad. Nauk SSSR,146 (1962), 263–266; English translation:Soviet Math. Dokl.,3 (1962), 1259–1263.
Google Scholar
D. M. Choy, A growth-oriented scheme to distribute objects to multiple storage locations, to appear.
D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen, The Gamma database machine project,IEEE Trans. Knowledge Data Engrg.,2 (1990), 44–62.
Article Google Scholar
IBM,IBM 3514Quick Reference Manual, Publication SA21-9613, 1993.
D. E. Knuth,The Art of Computer Programming, Vol. 3, Addison-Wesley, Reading, MA, 1973.
Google Scholar
D. A. Patterson, G. Gibson, and R. H. Katz, A case for redundant arrays of inexpensive disks (RAID),Proceedings of the ACM SIGMOD International Conference on Management of Data, 1988, pp. 109–116.
Teradata,DBC/1012Database Computer System Manual Release 2.0, Document C10-0001-02, Teradata Corp., Nov. 1985.

Download references

Author information

Authors and Affiliations

IBM Research Division, Almaden Research Center, 650 Harry Road, 95120-6099, San Jose, CA, USA
D. M. Choy, R. Fagin & L. Stockmeyer

Authors

D. M. Choy
View author publications
You can also search for this author in PubMed Google Scholar
R. Fagin
View author publications
You can also search for this author in PubMed Google Scholar
L. Stockmeyer
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by C. K. Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choy, D.M., Fagin, R. & Stockmeyer, L. Efficiently extendible mappings for balanced data distribution. Algorithmica 16, 215–232 (1996). https://doi.org/10.1007/BF01940647

Download citation

Received: 22 March 1994
Revised: 16 October 1994
Issue Date: August 1996
DOI: https://doi.org/10.1007/BF01940647

Key words

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Efficiently extendible mappings for balanced data distribution

Abstract

Article PDF

Similar content being viewed by others

Access Patterns Optimization in Distributed Databases Using Data Reallocation

The splay-list: a distribution-adaptive concurrent skip-list

Scalable Blocking for Very Large Databases

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Efficiently extendible mappings for balanced data distribution

Abstract

Article PDF

Similar content being viewed by others

Access Patterns Optimization in Distributed Databases Using Data Reallocation

The splay-list: a distribution-adaptive concurrent skip-list

Scalable Blocking for Very Large Databases

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation