Abstract
Skew effects are a serious problem in parallel database systems, but the relationship between different skew types and load balancing methods is still not fully understood. We develop and compare two classifications of skew effects and load balancing strategies, respectively, to match their relevant properties.
Our conclusions highlight the importance of highly dynamic scheduling to optimize both the complexity and the success of load balancing. We also suggest the tuning of database schemata as a new anti-skew measure.
Chapter PDF
Similar content being viewed by others
References
C. M. Chen, N. Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. Proc. ACM SIGMOD Conf., Minneapolis, 1994.
H. M. Dewan, M. Hernández, K. W. Mok, S. J. Stolfo: Predictive Dynamic Load Balancing of Parallel Hash-Joins Over Heterogeneous Processors in the Presence of Data Skew. Proc. 3rd PDIS Conf., Austin, 1994.
C. Faloutsos, Y. Matias, A. Silberschatz: Modeling skewed distributions using multifractals and the ‘80-20 law’. Proc. 22nd VLDB Conf., Bombay, 1996.
J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, M. A. Shah: Adaptive Query Processing: Technology in Evolution. Data Eng. Bulletin 23(2), 2000.
K. A. Hua, C. Lee: Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning. Proc. 17th VLDB Conf., Barcelona, 1991.
N. Kabra, D. J. DeWitt: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. Proc. ACM SIGMOD Conf., Seattle, 1998.
H. Märtens: Skew-Insensitive Join Processing in Shared-Disk Database Systems. Proc. IDPT Conf., Berlin, 1998.
H. Märtens: On Disk Allocation of Intermediate Query Results in Parallel Database Systems. Proc. Euro-Par Conf., Toulouse, 1999.
H. Märtens: A Classification of Skew Effects in Parallel Database Systems. Techn. Report 3/2001, Dept. of Computer Science, Univ. of Leipzig. (Available at: http://dol.uni-leipzig.de/pub/2001-21/en)
Y. Matias, J. S. Vitter, M. Wang: Wavelet-Based Histograms for Selectivity Estimation. Proc. ACM SIGMOD Conf., Seattle, 1998.
V. Poosala, Y. E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. Proc. 23rd VLDB Conf., Athens, 1997.
S. Seshadri, J. F. Naughton: Sampling Issues in Parallel Database Systems. Proc. 3rd EDBT Conf., Vienna, 1992.
T. Schnekenburger, G. Stellner (eds.): Dynamic Load Distribution for Parallel Applications. Teubner, Leipzig, 1997.
C. B. Walton, A. G. Dale, R. M. Jenevein: A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins. Proc. 17th VLDB Conf., Barcelona, 1991.
X. Zhou, M. E. Orlowska: A Dynamic Approach for Handling Data Skew Problems in Parallel Hash Join Computation. Proc. IEEE TENCON Conf., Beijing, 1993.
X. Zhou, M. E. Orlowska: Handling Data Skew in Parallel Hash Join Computation Using Two-Phase Scheduling. Proc. ICA3PP Conf., Brisbane, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Märtens, H. (2001). A Classification of Skew Effects in Parallel Database Systems. In: Sakellariou, R., Gurd, J., Freeman, L., Keane, J. (eds) Euro-Par 2001 Parallel Processing. Euro-Par 2001. Lecture Notes in Computer Science, vol 2150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44681-8_42
Download citation
DOI: https://doi.org/10.1007/3-540-44681-8_42
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42495-6
Online ISBN: 978-3-540-44681-1
eBook Packages: Springer Book Archive