Abstract
In data storage applications, a large collection of consecutively numbered data “buckets” are often mapped to a relatively small collection of consecutively numbered storage “bins.” For example, in parallel database applications, buckets correspond to hash buckets of data and bins correspond to database nodes. In disk array applications, buckets correspond to logical tracks and bins correspond to physical disks in an array. Measures of the “goodness” of a mapping method include:
-
(1)
Thetime (number of operations) needed to compute the mapping.
-
(2)
Thestorage needed to store a representation of the mapping.
-
(3)
Thebalance of the mapping, i.e., the extent to which all bins receive the same number of buckets.
-
(4)
The cost ofrelocation, that is, the number of buckets that must be relocated to a new bin if a new mapping is needed due to an expansion of the number of bins or the number of buckets.
One contribution of this paper is to give a new mapping method, theInterval-Round-Robin (IRR) method. The IRR method has optimal balance and relocation cost, and its time complexity and storage requirements compare favorably with known methods. Specifically, ifm is the number of times that the number of bins and/or buckets has increased, then the time complexity isO(logm) and the storage isO(m 2). Another contribution of the paper is to identify the concept of ahistory-independent mapping, meaning informally that the mapping does not “remember” the past history of expansions to the number of buckets and bins, but only the current number of buckets and bins. Thus, such mappings require very little information to be stored. Assuming that balance and relocation are optimal, we prove that history-independent mappings are possible if the number of buckets is fixed (so only the number of bins can increase), but not possible if the number of bins and buckets can both increase.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
G. M. Adel'son-Vel'skii and E. M. Landis, An algorithm for the organization of information,Dokl. Akad. Nauk SSSR,146 (1962), 263–266; English translation:Soviet Math. Dokl.,3 (1962), 1259–1263.
D. M. Choy, A growth-oriented scheme to distribute objects to multiple storage locations, to appear.
D. J. DeWitt, S. Ghandeharizadeh, D. A. Schneider, A. Bricker, H. Hsiao, and R. Rasmussen, The Gamma database machine project,IEEE Trans. Knowledge Data Engrg.,2 (1990), 44–62.
IBM,IBM 3514Quick Reference Manual, Publication SA21-9613, 1993.
D. E. Knuth,The Art of Computer Programming, Vol. 3, Addison-Wesley, Reading, MA, 1973.
D. A. Patterson, G. Gibson, and R. H. Katz, A case for redundant arrays of inexpensive disks (RAID),Proceedings of the ACM SIGMOD International Conference on Management of Data, 1988, pp. 109–116.
Teradata,DBC/1012Database Computer System Manual Release 2.0, Document C10-0001-02, Teradata Corp., Nov. 1985.
Author information
Authors and Affiliations
Additional information
Communicated by C. K. Wang.
Rights and permissions
About this article
Cite this article
Choy, D.M., Fagin, R. & Stockmeyer, L. Efficiently extendible mappings for balanced data distribution. Algorithmica 16, 215–232 (1996). https://doi.org/10.1007/BF01940647
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01940647