Abstract
In this paper, we propose parallel processing of continuous queries over data streams to handle the bottleneck of single processor DSMSs. Queries are executed in parallel over the logical machines in a multiprocessing environment. Scheduling parallel execution of operators is performed via finding the shortest path in a weighted graph called Query Mega Graph (QMG), which is a logical view of K machines. By lapse of time, number of tuples waiting in queues of different operators may be very different. When a queue becomes full, re-scheduling is done by updating weight of edges of QMG. In the new computed path, machines with more workload will be used less. The proposed system is formally presented and its correctness is proved. It is also modeled in PetriNets and its performance is evaluated and compared with serial query processing as well as the Min-Latency scheduling algorithm. The presented system is shown to outperform them w.r.t. tuple latency (response time), memory usage, throughput and also tuple loss- critical parameters in any data stream management systems.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Babcock, B., et al.: Models and issues in data stream systems. In: Proceeding of PODS, Invited paper (2002)
The STREAM Group.: STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, March (2003)
Abadi, D., et al.: Aurora: A new model and architecture for data stream management. VLDB J. 2(12), 120–139 (2003)
Babcock, B.: Models and issues in data stream systems. In: Proceeding of PODS (2002)
Golab, L., Ozsu, M.T.: Issues in data stream management. SIGMOD Records (2003)
Tatbul, N., et al.: Load shedding in a data stream manager. In: Proceedings of VLDB’03, Germany, pp. 309–320 (2003)
Chekuri, C., et al.: Scheduling problems in parallel query optimization. In: Proceeding of PODS (1995)
Babcock, B., et al.: Operator scheduling in data stream systems. VLDB J. 13(4), 333–353 (2004)
Hong, W.: Parallel query processing using shared memory multiprocessors and disk arrays. Ph.D. thesis (1992)
Hasan, W., et al.: Open issues in parallel query optimization. ACM SIGMOD Rec. 25(3), 28–33 (1996)
Graefe, G., et al.: Extensible Query Optimization and Parallel Execution in Volcano. Query Processing for Advanced Database Systems. Morgan Kaufmann, San Mateo (1994)
DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database processing. Commun. ACM 36(6), 377–387 (1992)
Babu, S., Widom, J.: Continuous queries over data streams. In: SIGMOD Record (2001)
DeWitt, C., Naughton: Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In: Proceeding of ICDE (2002)
Movaghar, A.: Performability modeling with stochastic activity networks, Ph.D. dissertation, University of Michigan (1985)
Abdollahi Azgomi, M., Movaghar, A.: Coloured stochastic activity networks: Definitions and behavior. In: Proceeding of UKPEW’04, UK, pp. 297–308 (2004)
Khalili, A., et al.: PDETool: A multi-formalism modeling tool for discrete-event systems based on SDES description. In: Lecture Notes in Computer Science, vol. 5606, pp. 343–352. Springer, Berlin (2009)
Carney, D., et al.: Operator scheduling in a data stream manager. In: Proceedings of the VLDB, Germany, pp. 838–849 (2003)
Arasu, et al.: Characterizing memory requirements for queries over continuous data streams. In: Proceeding of the PODS (2002)
Babcock, B., et al.: Chain: Operator scheduling for memory minimization in data stream systems. In: Proceeding of the ACM SIGMOD (2003)
Babu, et al.: Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Technical Report, November (2002)
Ghalambor, M., Safaeei, A.A., Azgomi, M.A.: DSMS scheduling regarding complex QoS metrics. In: IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), 10–13 May 2009
Abadi, D.J., et al.: The design of the Borealis stream processing engine. In: Proceeding of the CIDR (2005)
Xing, Y., et al.: Dynamic load distribution in the Borealis stream processor. In: Proceeding of ICDE (2005)
Babu, S.: Adaptive query processing in data stream management systems. Ph.D. thesis (2005)
Graefe, G.: Volcano—An extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)
Apers, P.M.G., et al.: PRISMA/DB: A parallel, main memory relational DBMS. IEEE Trans. Knowl. Data Eng. 4(6), 4–24 (1992)
Beringer, J., Hullermeier, E.: Online clustering of parallel data streams. Int. J. Data Knowl. Eng. 58(2), 180–204 (2006)
Kramer1, J.: Dynamic plan migration for snapshot-equivalent continuous queries in data stream systems. In: Proceeding of ICSWN (2006)
Zhu, Y., et al.: Dynamic plan migration for continuous queries over data streams. In: Proceeding of SIGMOD (2004)
Safaei, A.A., et al.: Using finite state machines in processing continuous queries. Int. Rev. Comput. Softw. 4(5), 551–556 (2009)
Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proceeding of the VLDB (2003)
Chakravarthy, S., Pajjuri, V.: Scheduling Strategies and Their Evaluation in a Data Stream Management System. LNCS, vol. 4042. Springer, Berlin (2006)
Shah, M.A., et al.: Flux: An adaptive partitioning operator for continuous query systems. In: Proceeding of the ICDE (2003)
Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proceeding of the VLDB (2003)
Nehme, R.V., et al.: Query mesh: Multiroute query processing technology. In: Proceeding of the VLDB (2009)
Nehme, R., et al.: Self-tuning query mesh for adaptive multi-route query processing. In: Proceeding of the ACM EDBT (2009)
Alqadi, Z.A.A., et al.: Performance analysis and evaluation of parallel matrix multiplication algorithms. World Appl. Sci. J. 5(2), 211–214 (2008)
Akl, S.G., et al.: Data-movement-intensive problems: Two folk theorems in parallel computation revisited. Theor. Comput. Sci. 95, 323–337 (1992)
Almasi, G.S., Gottlieb, A.: Highly Parallel Computing. Benjamin/Cummings, Redwood City (1989)
Cosnard, M., Trystram, D.: Parallel Algorithms and Architectures. International Thompson Computer Press, Boston (1995)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ahmed K. Elmagarmid.
Rights and permissions
About this article
Cite this article
Safaei, A.A., Haghjoo, M.S. Parallel processing of continuous queries over data streams. Distrib Parallel Databases 28, 93–118 (2010). https://doi.org/10.1007/s10619-010-7066-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-010-7066-3