Abstract
Growing amounts of data and the demand to process them within time constraints have led to the development of big data systems. A generic principle to design such systems that allows for low latency results is called the lambda architecture. It defines that data is analyzed twice by combining batch and stream processing techniques in order to provide a real time view. This redundant processing of data makes this architecture very expensive. In cases where process results are not continuously required to be low latency or time constraints lie within several minutes, a clear decision whether both processing layers are inevitable is not possible yet. Therefore, we propose stream processing on demand within the lambda architecture in order to efficiently use resources and reduce hardware investments. We use performance models as an analytical decision-making solution to predict response times of batch processes and to decide when to additionally deploy stream processes. By the example of a smart energy use case we implement and evaluate the accuracy of our proposed solution.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Alrokayan, M., Vahid Dastjerdi, A., Buyya, R.: Sla-aware provisioning and scheduling of cloud resources for big data analytics. In: Proceedings of the 2014 IEEE International Conference on Cloud Computing in Emerging Markets, pp. 1–8. IEEE (2014)
Amazon Web Services: Amazon Kinesis (2015). http://aws.amazon.com/kinesis/ (accessed: April 28, 2015)
Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-based Systems, pp. 207–218. ACM, New York (2013)
Apache Cassandra: The Apache Cassandra project (2015). http://cassandra.apache.org/ (accessed April 28, 2015)
Apache Hadoop: Welcome to Apache Hadoop! (2015). http://hadoop.apache.org/ (accessed April 28, 2015)
Kafka, A.: A high-throughput distributed messaging system (2015). http://kafka.apache.org/ (accessed April 28, 2015)
Apache Pig: Welcomt to Apache Pig! (2014). https://pig.apache.org/ (accessed April 28, 2015)
Apache Samza: Samza (2015). http://samza.apache.org/ (accessed April 28, 2015)
Apache Spark: Lightning-fast cluster computing (2015). https://spark.apache.org/ (accessed April 28, 2015)
Apache Storm: Storm, distributed and fault-tolerant realtime computation (2015). http://storm.apache.org/ (accessed April 28, 2015)
Barbierato, E., Gribaudo, M., Iacono, M.: Performance evaluation of nosql big-data applications using multi-formalism models. Future Generation Computer Systems 37, 345–353 (2014)
Becker, S., Koziolek, H., Reussner, R.: The palladio component model for model-driven performance prediction. The Journal of Systems and Software 82(1), 3–22 (2009)
Brosig, F., Meier, P., Becker, S., Koziolek, A., Koziolek, H., Kounev, S.: Quantitative evaluation of model-driven performance analysis and simulation of component-based architectures. IEEE Transactions on Software Engineering 41(2), 157–175 (2015)
Brunnert, A., Vögele, C., Danciu, A., Pfaff, M., Mayer, M., Krcmar, H.: Performance management work. Business & Information Systems Engineering 6(3), 177–179 (2014)
Casado, R., Younas, M.: Emerging trends and technologies in big data processing. Concurrency and Computation: Practice and Experience 27(8), 2078–2091 (2015)
Castiglione, A., Gribaudo, M., Iacono, M., Palmieri, F.: Modeling performances of concurrent big data applications. Practice and Experience, Software (2014)
Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Information Sciences 275, 314–347 (2014)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Faulstich, S., Hahn, B., Tavner, P.J.: Wind turbine downtime and its importance for offshore deployment. Wind Energy 14(3), 327–337 (2011)
Faulstich, S., Lyding, P., Tavner, P.: Effects of wind speed on wind turbine availability (2011)
Herbst, N.R., Huber, N., Kounev, S., Amrehn, E.: Self-adaptive workload classification and forecasting for proactive resource provisioning. Concurrency and Computation: Practice and Experience 26(12), 2053–2078 (2014)
von Kistowski, J., Herbst, N.R., Kounev, S.: LIMBO: A tool for modeling variable load intensities. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering, pp. 225–226. ACM, New York (2014)
Kroß, J., Brunnert, A., Prehofer, C., Runkler, T.A., Krcmar, H.: Model-based performance evaluation of large-scale smart metering architectures. In: Proceedings of the 4th International Workshop on Large-Scale Testing, pp. 9–12. ACM, New York (2015)
Liu, X., Iftikhar, N., Xie, X.: Survey of real-time processing systems for big data. In: Proceedings of the 18th International Database Engineering & Applications Symposium, pp. 356–361. ACM, New York (2014)
Martnez-Prieto, M.A., Cuesta, C.E., Arias, M., Fernnde, J.D.: The solid architecture for real-time management of big semantic data. Future Generation Computer Systems 47, 62–79 (2015), special Section: Advanced Architectures for the Future Generation of Software-Intensive Systems
Marz, N., Warren, J.: Big data: principles and best practices of scalable real-time data systems. Manning Publications Co. (2015)
Nabi, Z., Wagle, R., Bouillet, E.: The best of two worlds: integrating ibm infosphere streams with apache yarn. In: Proceedings of the 2014 IEEE International Conference on Big Data, pp. 47–51. IEEE (2014)
Rychlý, M., Škoda, P., Smrž, P.: Heterogeneity-aware scheduler for stream processing frameworks. International Journal of Big Data Intelligence 2(2), 70–80 (2015)
Schäfer, A.M., Zimmermann, H.-G.: Recurrent Neural Networks Are Universal Approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006)
Schermann, M., Hemsen, H.: Buchmller, C., Bitter, T., Krcmar, H., Markl, V., Hoeren, T.: Big data - an interdisciplinary opportunity for information systems research. Business & Information. Systems Engineering 6(5), 261–266 (2014)
Sequeira, H., Carreira, P., Goldschmidt, T., Vorst, P.: Energy cloud: Real-time cloud-native energy management system to monitor and analyze energy consumption in multiple industrial sites. In: Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, pp. 529–534. IEEE (2014)
Spinner, S., Casale, G., Zhu, X., Kounev, S.: LibReDE: a library for resource demand estimation. In: Proceedings of the 5th ACM/SPEC International Conference on Performance Engineering (ICPE 2014), pp. 227–228. ACM, New York (2014)
Taylor, J.W.: An evaluation of methods for very short-term load forecasting using minute-by-minute british data. International Journal of Forecasting 24(4), 645–658 (2008)
Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM, New York (2011)
Vianna, E., Comarela, G., Pontes, T., Almeida, J., Almeida, V., Wilkinson, K., Kuno, H., Dayal, U.: Analytical performance models for mapreduce workloads. International Journal of Parallel Programming 41(4), 495–525 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kroß, J., Brunnert, A., Prehofer, C., Runkler, T.A., Krcmar, H. (2015). Stream Processing on Demand for Lambda Architectures. In: Beltrán, M., Knottenbelt, W., Bradley, J. (eds) Computer Performance Engineering. EPEW 2015. Lecture Notes in Computer Science(), vol 9272. Springer, Cham. https://doi.org/10.1007/978-3-319-23267-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-23267-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23266-9
Online ISBN: 978-3-319-23267-6
eBook Packages: Computer ScienceComputer Science (R0)