Web-Scale Job Scheduling

Cirne, Walfredo; Frachtenberg, Eitan

doi:10.1007/978-3-642-35867-8_1

Walfredo Cirne²⁰ &
Eitan Frachtenberg²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7698))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

1021 Accesses
7 Citations

Abstract

Web datacenters and clusters can be larger than the world’s largest supercomputers, and run workloads that are at least as heterogeneous and complex as their high-performance computing counterparts. And yet little is known about the unique job scheduling challenges of these environments. This article aims to ameliorate this situation. It discusses the challenges of running web infrastructure and describes several techniques to address them. It also presents some of the problems that remain open in the field.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A Systematic Survey on Load Balancing in the Cloud

How to Design a Job Scheduling Algorithm

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ben-Yehuda, O.A., Ben-Yehuda, M., Schuster, A., Tsafrir, D.: Deconstructing amazon ec2 spot instance pricing. In: CloudCom 2011: 3rd IEEE International Conference on Cloud Computing Technology and Science (2011)
Google Scholar
Ananthanarayanan, G., Ghodsi, A., Wang, A., Borthakur, D., Kandula, S., Shenker, S., Stoica, I.: PACMan: Coordinated memory caching for parallel jobs. In: Ninth USENIX Symposium on Networked Systems Design and Implementation, San Jose, CA (April 2012)
Google Scholar
Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V.: FAWN: a fast array of wimpy nodes. In: Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pp. 1–14. ACM, New York (2009), portal.acm.org/citation.cfm?id=1629577
Chapter Google Scholar
Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., Paleczny, M.: Workload analysis of a large-scale key-value store. In: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2012, pp. 53–64. ACM, New York (2012)
Chapter Google Scholar
Barroso, L.A., Hölzle, U.: The case for energy-proportional computing. IEEE Computer 40(12), 33–37 (2007), citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.128.5419
Article Google Scholar
Borthakur, D., Gray, J., Sarma, J.S., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S., Schmidt, R., Aiyer, A.: Apache hadoop goes realtime at facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1071–1080. ACM, New York (2011)
Google Scholar
Bricker, A., Litzkow, M., Livny, M.: Condor technical summary, version 4.1b. Technical Report CS-TR-92-1069 (January 1992), http://citeseer.ist.psu.edu/briker91condor.html
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Feitelson, D.G.: Metrics for Parallel Job Scheduling and Their Convergence. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 188–1205. Springer, Heidelberg (2001)
Chapter Google Scholar
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel Job Scheduling — A Status Report. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 1–16. Springer, Heidelberg (2005)
Chapter Google Scholar
Frachtenberg, E., Feitelson, D.G.: Pitfalls in Parallel Job Scheduling Evaluation. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 257–282. Springer, Heidelberg (2005)
Chapter Google Scholar
Frachtenberg, E., Schwiegelshohn, U.: New Challenges of Parallel Job Scheduling. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2007. LNCS, vol. 4942, pp. 1–23. Springer, Heidelberg (2008)
Chapter Google Scholar
Holt, G.: Time-Critical Scheduling on a Well Utilised HPC System at ECMWF Using Loadleveler with Resource Reservation. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 102–124. Springer, Heidelberg (2005)
Chapter Google Scholar
Jackson, D., Snell, Q., Clement, M.: Core Algorithms of the Maui Scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 87–102. Springer, Heidelberg (2001)
Chapter Google Scholar
Jones, T., Tuel, W., Brenner, L., Fier, J., Caffrey, P., Dawson, S., Neely, R., Blackmore, R., Maskell, B., Tomlinson, P., Roberts, M.: Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: 15th IEEE/ACM Supercomputing. ACM Press and IEEE Computer Society Press, Phoenix, AZ (2003), www.sc-conference.org/sc2003/paperpdfs/pap136.pdf
Google Scholar
Karp, R.: Reducibility among combinatorial problems. In: Miller, R., Thatcher, J. (eds.) Complexity of Computer Computations, pp. 85–103 (1972)
Google Scholar
Kaushik, R.T., Bhandarkar, M.: Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the 2010 International Conference on Power Aware Computing and Systems, HotPower 2010, pp. 1–9. USENIX Association, Berkeley (2010)
Google Scholar
Leverich, J., Kozyrakis, C.: On the energy (in)efficiency of hadoop clusters. SIGOPS Oper. Syst. Rev. 44(1), 61–65 (2010)
Article Google Scholar
Mars, J., Tang, L., Hundt, R., Skadron, K., Souffa, M.L.: Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA (2011)
Google Scholar
Meisner, D., Gold, B.T., Wenisch, T.F.: Powernap: eliminating server idle power. In: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2009, pp. 205–216. ACM, New York (2009)
Chapter Google Scholar
Mishra, A.K., Hellerstein, J.L., Cirne, W., Das, C.R.: Towards characterizing cloud backend workloads: insights from google compute clusters. SIGMETRICS Performance Evaluation Review 37(4), 34–41 (2010)
Article Google Scholar
Nissimov, A., Feitelson, D.G.: Probabilistic Backfilling. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2007. LNCS, vol. 4942, pp. 102–115. Springer, Heidelberg (2008)
Chapter Google Scholar
Paul, R.: A behind-the-scenes look at Facebook release engineering (April 2012), http://arstechnica.com/business/2012/04/exclusive-a-behind-the-scenes-look-at-facebook-release-engineering/
Raj, H., Nathuji, R., Singh, A., England, P.: Resource management for isolation enhanced cloud services. In: Proceedings of the 2009 ACM Workshop on Cloud Computing Security, CCSW 2009, pp. 77–84. ACM, New York (2009)
Chapter Google Scholar
Sabin, G., Sadayappan, P.: Unfairness Metrics for Space-Sharing Parallel Job Schedulers. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 238–256. Springer, Heidelberg (2005)
Chapter Google Scholar
Schroeder, B., Harchol-Balter, M.: Web servers under overload: How scheduling can help. ACM Trans. Internet Technol. 6(1), 20–52 (2006)
Article Google Scholar
Sodan, A.C.: Adaptive Scheduling for QoS Virtual Machines under Different Resource Allocation – Performance Effects and Predictability. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2009. LNCS, vol. 5798, pp. 259–279. Springer, Heidelberg (2009)
Chapter Google Scholar
White, T.: Hadoop: The Definitive Guide. Yahoo! Press, USA (2010)
Google Scholar
Xiong, K., Suh, S.: Resource Provisioning in SLA-Based Cluster Computing. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 1–15. Springer, Heidelberg (2010)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Google, USA
Walfredo Cirne
Facebook, USA
Eitan Frachtenberg

Authors

Walfredo Cirne
View author publications
You can also search for this author in PubMed Google Scholar
Eitan Frachtenberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Google, 1600 Amphitheater Parkway, 94043, Mountain View, CA, USA
Walfredo Cirne
Mathematics and Computer Science Division, Argonne National Laboratory, Bldg 240, 60439, Argonne, IL, USA
Narayan Desai
Facebook Inc., 1601 Willow Road, 94025, Menlo Park, CA, USA
Eitan Frachtenberg
Robotics Research Institute, TU Dortmund, Otto-Hahn-Str. 8, 44227, Dortmund, Germany
Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cirne, W., Frachtenberg, E. (2013). Web-Scale Job Scheduling. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2012. Lecture Notes in Computer Science, vol 7698. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35867-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-35867-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35866-1
Online ISBN: 978-3-642-35867-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Web-Scale Job Scheduling

Abstract

Chapter PDF

Similar content being viewed by others

A Systematic Survey on Load Balancing in the Cloud

A Systematic Survey on Load Balancing in the Cloud

How to Design a Job Scheduling Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Web-Scale Job Scheduling

Abstract

Chapter PDF

Similar content being viewed by others

A Systematic Survey on Load Balancing in the Cloud

A Systematic Survey on Load Balancing in the Cloud

How to Design a Job Scheduling Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation