Improvement of Join Algorithms for Low-Selectivity Joins on MapReduce

Matono, Akiyoshi; Ogawa, Hirotaka; Kojima, Isao

doi:10.1007/978-3-319-19548-3_10

Akiyoshi Matono¹⁶,
Hirotaka Ogawa¹⁶ &
Isao Kojima¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9093))

Included in the following conference series:

Australasian Database Conference

1505 Accesses

Abstract

So far, many studies on join operations on MapReduce have already been proposed. Some of those studies focus on the low-selectivity joins that are frequently used in query processing for datasets among several management domains, such as those used with Linked Open Data. We found there is room for improvement of the state-of-the-art approach for the low-selectivity join on MapReduce called the per-split semi-join[5]. In this paper, we first thus extend the per-split semi-join to improve performance. Our approach can reduce three costs for low-selectivity joins: the amount of network traffic, the number of jobs, and the amount of disk I/O. Moreover, when the number of input relations is large, the selectivity becomes low and thus the effect of our proposed reductions is maximized. Therefore, we also propose an extension of the reduce-side join, which can easily apply three or more inputs, based on the extension of the per-split semi-join. In our experiments, we evaluated two comparisons: the per-split semi-join and our extension of it and the reduce-side join and our extension of it. The first experiment shows that our extension is better than its competitor in any case. In the second experiment, we found that our extension is superior to its competitor when the selectivity is low and the number of inputs is three or more.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Efficient Processing of Multi-way Joins Using MapReduce

A Theoretical and Experimental Comparison of Filter-Based Equijoins in MapReduce

SALA: A Skew-Avoiding and Locality-Aware Algorithm for MapReduce-Based Join

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Afrati, F.N., Ullman, J.D.: Optimizing joins in a map-reduce environment. In: Proceedings of the 13th International Conference on Extending Database Technology, EDBT 2010, pp. 99–110. ACM, New York (2010). http://doi.acm.org/10.1145/1739041.1739056
Andreas, C.: Designing a Parallel Query Engine over Map/Reduce. Master’s thesis, Informatics MSc, School of Informatics, University of Edinburgh (2010)
Google Scholar
Atta, F.: Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework. Master’s thesis, Informatics MSc, School of Informatics, University of Edinburgh (2010)
Google Scholar
Berners-Lee, T.: Linked data (August 27, 2006). http://www.w3.org/DesignIssues/LinkedData.html
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 975–986. ACM, New York (2010). http://doi.acm.org/10.1145/1807167.1807273
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970). http://doi.acm.org/10.1145/362686.362692
Chandar, J.: Join Algorithms using Map/Reduce. Master’s thesis, Informatics MSc, School of Informatics, University of Edinburgh (2010)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004). http://dl.acm.org/citation.cfm?id=1251254.1251264
DeWitt, D.J., Gerber, R.H.: Multiprocessor hash-based join algorithms. In: Proceedings of the 11th International Conference on Very Large Data Bases, VLDB 1985, vol. 11, pp. 151–164. VLDB Endowment (1985). http://dl.acm.org/citation.cfm?id=1286760.1286774
Gates, A.: Programming Pig. O’Reilly, Media (September 2011)
Google Scholar
Jadhav, V., Aghav, J., Dorwani, S.: Join algorithms using mapreduce: a survey. In: International Conference on Electrical Engineering and Computer Science, ICETACS 2013, Coimbatore, pp. 40–44, April 2013
Google Scholar
Radu, V.: Application. In: Radu, V. (ed.) Stochastic Modeling of Thermal Fatigue Crack Growth. ACM, vol. 1, pp. 63–70. Springer, Heidelberg (2015)
Google Scholar
Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with mapreduce: A survey. SIGMOD Rec. 40(4), 11–20 (2012). http://doi.acm.org/10.1145/2094114.2094118
Lee, T., Kim, K., Kim, H.J.: Join processing using bloom filter in mapreduce. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, RACS 2012, pp. 100–105. ACM, New York (2012). http://doi.acm.org/10.1145/2401603.2401626
Luo, G., Dong, L.: Adaptive join plan generation in hadoop. Tech. rep., Duke University (2010), cPS296.1 Course Project
Google Scholar
Pigul, A.: Generalized parallel join algorithms and designing cost models. In: Proceedings of the Spring Young Researcher’s Colloquium On Database and Information Systems, SYRCoDIS 2012, pp. 29–40 (2012)
Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media / Yahoo Press, 3rd edn. (May 2012)
Google Scholar
Zhang, C., Wu, L., Li, J.: Efficient processing distributed joins with bloomfilter using mapreduce. International Journal of Grid and Distributed Computing 6(3), 43–58 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Advanced Industrial Science and Technology, 1-1-1 Umezono Tsukuba, Ibaraki, Japan
Akiyoshi Matono, Hirotaka Ogawa & Isao Kojima

Authors

Akiyoshi Matono
View author publications
You can also search for this author in PubMed Google Scholar
Hirotaka Ogawa
View author publications
You can also search for this author in PubMed Google Scholar
Isao Kojima
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akiyoshi Matono .

Editor information

Editors and Affiliations

University of Queensland, Brisbane, Queensland, Australia
Mohamed A. Sharaf
Monash University, Clayton, Australia
Muhammad Aamir Cheema
The University of Melbourne, Melbourne, Australia
Jianzhong Qi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matono, A., Ogawa, H., Kojima, I. (2015). Improvement of Join Algorithms for Low-Selectivity Joins on MapReduce. In: Sharaf, M., Cheema, M., Qi, J. (eds) Databases Theory and Applications. ADC 2015. Lecture Notes in Computer Science(), vol 9093. Springer, Cham. https://doi.org/10.1007/978-3-319-19548-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-19548-3_10
Published: 28 May 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19547-6
Online ISBN: 978-3-319-19548-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improvement of Join Algorithms for Low-Selectivity Joins on MapReduce

Abstract

Chapter PDF

Similar content being viewed by others

Efficient Processing of Multi-way Joins Using MapReduce

A Theoretical and Experimental Comparison of Filter-Based Equijoins in MapReduce

SALA: A Skew-Avoiding and Locality-Aware Algorithm for MapReduce-Based Join

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improvement of Join Algorithms for Low-Selectivity Joins on MapReduce

Abstract

Chapter PDF

Similar content being viewed by others

Efficient Processing of Multi-way Joins Using MapReduce

A Theoretical and Experimental Comparison of Filter-Based Equijoins in MapReduce

SALA: A Skew-Avoiding and Locality-Aware Algorithm for MapReduce-Based Join

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation