Breaking the MapReduce stage barrier

Verma, Abhishek; Cho, Brian; Zea, Nicolas; Gupta, Indranil; Campbell, Roy H.

doi:10.1007/s10586-011-0182-7

Breaking the MapReduce stage barrier

Published: 10 September 2011

Volume 16, pages 191–206, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cluster Computing Aims and scope Submit manuscript

Breaking the MapReduce stage barrier

Download PDF

Abhishek Verma¹,
Brian Cho¹,
Nicolas Zea¹,
Indranil Gupta¹ &
…
Roy H. Campbell¹

631 Accesses
20 Citations
Explore all metrics

Abstract

The MapReduce model uses a barrier between the Map and Reduce stages. This provides simplicity in both programming and implementation. However, in many situations, this barrier hurts performance because it is overly restrictive. Hence, we develop a method to break the barrier in MapReduce in a way that improves efficiency. Careful design of our barrier-less MapReduce framework results in equivalent generality and retains ease of programming. We motivate our case with, and experimentally study our barrier-less techniques in, a wide variety of MapReduce applications divided into seven classes. Our experiments show that our approach can achieve better job completion times than a traditional MapReduce framework. This is due primarily to the interleaving of I/O and computation, and forgoing disk-intensive work. We achieve a reduction in job completion times that is 25% on average and 87% in the best case.

References

Apache hadoop. http://hadoop.apache.org (2011)
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pp. 1–16. USENIX Association, Berkeley (2010)
Google Scholar
Black, F., Scholes, M.S.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3), 637–654 (1973)
Article Google Scholar
Brants, T., Popat, A.C., Xu, P., Och, F.J., Dean, J.: Large language models in machine translation. In: Proc. of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 858–867 (2007)
Google Scholar
Chu, C., Kim, S., Lin, Y., Yu, Y., Bradski, G., Ng, A., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems (NIPS), pp. 281–288 (2007)
Google Scholar
Condie, T., Conway, N., Alvaro, P., Hellerstein, Elmeleegy, K., Sears, R.: Online aggregation and continuous query support in mapreduce. In: NSDI’10: Seventh USENIX Symposium on Networked Systems Design and Implementation. USENIX Association, Berkeley (2010)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proc. of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 137–149 (2004)
Google Scholar
DeWitt, D., Gerber, R.: Multiprocessor hash-based join algorithms. In: Proc. of VLDB Conference. Citeseer, Princeton (1985)
Google Scholar
Dyer, C., Cordova, A., Mont, A., Lin, J.: Fast, easy, and cheap: construction of statistical machine translation models with mapreduce. In: Proc. of the Third Workshop on Statistical Machine Translation (StatMT), pp. 199–207. Association for Computational Linguistics, Stroudsburg (2008)
Chapter Google Scholar
Elsayed, T., Lin, J., Oard, D.W.: Pairwise document similarity in large collections with mapreduce. In: Proc. of the Annual Meeting of the Association for Computational Linguistics on Human Language Technologies (HLT), pp. 265–268. Association for Computational Linguistics, Stroudsburg (2008)
Google Scholar
Farivar, R., Verma, A., Chan, E., Campbell, R.: Mithra: multiple data independent tasks on a heterogeneous resource architecture. In: Proc. of IEEE International Conference on Cluster Computing and Workshops (CLUSTER), pp. 1–10 (2009). doi:10.1109/CLUSTR.2009.5289201
Google Scholar
Fix, E., Hodges, J.: Discriminatory analysis, nonparametric discrimination: Consistency properties. Tech. rep., USAF School of Aviation Medicine, Randolph Field, Texas (1951)
Guibas, L.J., Sedgewick, R.: A dichromatic framework for balanced trees. In: Proc. of Symposium on Foundations of Computer Science, pp. 8–21 (1978). doi:10.1109/SFCS.1978.3
Google Scholar
Gupta, A., Mumick, I.: Maintenance of materialized views: problems, techniques, and applications. Bull. Tech. Commun. 51, 3 (2010)
Google Scholar
Illinois Cloud Computing Testbed. http://cloud.cs.illinois.edu/ (2011)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proc. of European Conference on Computer Systems (EuroSys), pp. 59–72 (2007)
Google Scholar
Mongodb. http://www.mongodb.org (2011)
Olson, M.A., Bostic, K., Seltzer, M.: Berkeley db. In: Proc. of the USENIX Annual Technical Conference (ATEC), p. 43. USENIX Association, Berkeley (1999)
Google Scholar
O’Malley, O., Murthy, A.C.: Winning a 60 second dash with a yellow elephant. http://sortbenchmark.org/Yahoo2009.pdf (2009)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., Dewitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proc. of the ACM SIGMOD International Conference. ACM (2009). URL http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
Google Scholar
Popa, L., Budiu, M., Yu, Y., Isard, M.: DryadInc: Reusing work in large-scale computations. In: Proc. of Workshop on Hot Topics in Cloud Computing (HotCloud) (2009)
Google Scholar
Stonebraker, M., Abadi, D., DeWitt, D., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)
Article Google Scholar
Tokyo cabinet. http://1978th.net/tokyocabinet/ (2011)
Verma, A., Llora, X., Goldberg, D.E., Campbell, R.H.: Scaling genetic algorithms using mapreduce. In: Proc. of International Conference on Intelligent Systems Design and Applications (2009)
Google Scholar
Verma, A., Zea, N., Cho, B., Gupta, I., Campbell, R.: Breaking the MapReduce stage barrier. In: Proc. of IEEE International Conference on Cluster Computing (CLUSTER), pp. 235–244 (2010)
Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol (2009)
Google Scholar
Yang, H., Dasdan, A., Hsiao, R., Parker, D.: Map-reduce-merge: simplified relational data processing on large clusters. In: Proc. of the 2007 ACM SIGMOD International Conf. on Management of data (2007)
Google Scholar
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Proc. of the USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL, 61801-2302, USA
Abhishek Verma, Brian Cho, Nicolas Zea, Indranil Gupta & Roy H. Campbell

Authors

Abhishek Verma
View author publications
You can also search for this author in PubMed Google Scholar
Brian Cho
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Zea
View author publications
You can also search for this author in PubMed Google Scholar
Indranil Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Roy H. Campbell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abhishek Verma.

Additional information

This is an extended version of our IEEE Cluster 2010 paper [25].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Verma, A., Cho, B., Zea, N. et al. Breaking the MapReduce stage barrier. Cluster Comput 16, 191–206 (2013). https://doi.org/10.1007/s10586-011-0182-7

Download citation

Received: 20 July 2011
Accepted: 11 August 2011
Published: 10 September 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10586-011-0182-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Breaking the MapReduce stage barrier

Abstract

Article PDF

Similar content being viewed by others

Uncoupled MapReduce: A Balanced and Efficient Data Transfer Model

Performance Analysis of Adapting a MapReduce Framework to Dynamically Accommodate Heterogeneity

HadoopM: A Message-Enabled Data Processing System on Large Clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Breaking the MapReduce stage barrier

Abstract

Article PDF

Similar content being viewed by others

Uncoupled MapReduce: A Balanced and Efficient Data Transfer Model

Performance Analysis of Adapting a MapReduce Framework to Dynamically Accommodate Heterogeneity

HadoopM: A Message-Enabled Data Processing System on Large Clusters

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation