Abstract
Under limited resources, targeted prioritized data stream systems (TP) adjust the processing order of tuples to produce the most significant results first. In TP, an aggregation operator may not receive all tuples within an aggregation group. Typically, the aggregation operator is unaware of how many and which tuples are missing. As a consequence, computed averages over these streams could be skewed, invalid, and worse yet totally misleading. Such inaccurate results are unacceptable for many applications. TP-Ag is a novel aggregate operator for TP that produces reliable average calculations for normally distributed data under adverse conditions. It determines at run-time which results to produce and which subgroups in the aggregate population are used to generate each result. A carefully designed application of Cochran’s sample size methodology is used to measure the reliability of results. Each result is annotated with which subgroups were used in its production. Our experimental findings substantiate that TP-Ag increases the reliability of average calculations compared to the state-of-the-art approaches for TP systems (up to 91% more accurate results).
This work is supported by GAANN and NSF grants: IIS-1018443 & 0917017 & 0414567 & 0551584 (equipment grant).
This work started during Karen’s Ph.D. study at WPI.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Abadi, D.J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: A new model and architecture for data stream management. The International Journal on Very Large Data Bases, 120–139 (2003)
Abadi, D.J., et al.: Aurora: A new model and architecture for data stream management. VLDB Journal, 120–139 (2003)
Arasu, A., et al.: The cql continuous query language: semantic foundations and query execution. VLDB Journal, 121–142 (2006)
Babcock, B., et al.: Load shedding for aggregation queries over data streams. In: ICDE, p. 350 (2004)
Basaran, C., Kang, K.-D., Zhou, Y., Suzer, M.H.: Adaptive load shedding via fuzzy control in data stream management systems. In: 2012 5th IEEE International Conference on Service-Oriented Computing and Applications (SOCA), pp. 1–8. IEEE (2012)
Carney, D., et al.: Monitoring streams: A new class of data management applications. In: VLDB, pp. 215–226 (2002)
Cochran, W.G.: Sampling Techniques, 3 edn. John Wiley (1977)
Cormode, G., Korn, F., Tirthapura, S.: Time-decaying aggregates in out-of-order streams. PODS, 89–98 (2008)
Das, A., et al.: Semantic approximation of data stream joins. IEEE, 44–59 (2005)
Dobra, A., et al.: Processing complex aggregate queries over data streams. In: SIGMOD, pp. 61–72 (2002)
Fama, E.F.: The behavior of stock-market prices. The Journal of Business 38(1), 34–105 (1965)
Finance, Y.: http://finance.yahoo.com/
Gainey, R.R., et al.: Understanding the experience of house arrest with electronic monitoring: An analysis of quantitative and qualitative data. International Journal of Offender Therapy and Comparative Criminology (2000)
Golab, L., et al.: Update-pattern-aware modeling and processing of cont. queries. In: SIGMOD, pp. 658–669 (2005)
Guo, J.-F., He, C.-L.: Load shedding for sliding window aggregation queries over data streams. Application Research of Computers, 1–23 (2009)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. SIGMOD 26(2), 171–182 (1997)
Hoeffding, W.: Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association 58(301), 13–30 (1963)
Hoyle, S.: Use and abuse of statistics. ASLIB Proc. 40(11–12), 321–324 (1988)
Kang, H.G., Mahoney, D.F., Hoenig, H., Hirth, V.A., Bonato, P., Hajjar, I., Lipsitz, L.A.: In situ monitoring of health in older adults: technologies and issues. Journal of the American Geriatrics Society 58(8), 1579–1586 (2010)
Kargupta, H., Park, B.-H., Pittie, S., Liu, L., Kushraj, D., Sarkar, K.: Mobimine: monitoring the stock market from a pda. SIGKDD Explor. Newsl. 3(2), 37–46 (2002)
Katopodis, P., et al.: A hybrid, large-scale wireless sensor network for missile defense. IEEE, 1–5 (2007)
Li, J., et al.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD 34, 39–44 (2005)
Li, J., et al.: Semantics and evaluation techniques for window aggregates in data streams. SIGMOD, 311–322 (2005)
Lin, C.-C., et al.: Wireless health care service system for elderly with dementia. IEEE, 696–704 (2006)
Lin, O., Qin, Z., Jingjing, Q., Qiumei, P.: A new linear programming based load-shedding strategy. In: 2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science (DCABES), pp. 260–263. IEEE (2012)
Liu, B., et al.: Run-time operator state spilling for memory intensive long-running queries. SIGMOD, 347–358 (2006)
Longbo, Z., Zhanhuai, L., Zhenyou, W., Min, Y.: Semantic load shedding for sliding window join-aggregation queries over data streams. In: International Conference on Convergence Information Technology, pp. 2152–2155 (2007)
Ma, L., Zhang, Q., Shi, N.: A semantic load shedding algorithm based on priority table in data stream system. In: International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1167–1172 (2010)
Nehme, R.V., Rundensteiner, E.A.: Clustersheddy: Load shedding using moving clusters over spatio-temporal data streams. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 637–651. Springer, Heidelberg (2007)
Network, M.: Where have all the investors gone? (February 2012). http://money.msn.com
Olston, C., Widom, J.: Offering a precision-performance tradeoff for aggregation queries over replicated data. Technical Report 2000–16, Stanford InfoLab (2000)
Press, A.: Officials lose track of 16,000 sex offenders after gps fails (2010). http://www.foxnews.com
Reiss, F., Hellerstein, J.M.: Data triage: An adaptive architecture for load shedding in telegraphcq. In: IEEE International Conference on Data Engineering, pp. 155–156 (2005)
Rundensteiner, E.A., et al.: Cape: Continuous query engine with heterogeneous-grained adaptivity. In: VLDB, pp. 1353–1356 (2004)
Senthamilarasu, S., Hemalatha, M.: Load shedding techniques based on windows in data stream systems. In: 2012 International Conference on Emerging Trends in Science, Engineering and Technology (INCOSET), pp. 68–73. IEEE (2012)
Tatbul, N.: QoS-driven load shedding on data streams. In: Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds.) EDBT 2002. LNCS, vol. 2490, pp. 566–576. Springer, Heidelberg (2002)
Tatbul, N., Çetintemel, U., Zdonik, S.: Staying fit: Efficient load shedding techniques for distributed stream processing. In: International Conference on Very Large Data Bases, pp. 159–170 (2007)
Tatbul, N., et al.: Load shedding in a data stream manager. In: VLDB, pp. 309–320 (2003)
Tatbul, N., Zdonik, S.: Window-aware load shedding for aggregation queries over data streams. VLDB, 799–810 (2006)
Pham, T.N., Chrysanthis, P.K., Labrinidis, A.: Self-managing load shedding for data stream management systems, 1–7 (2013)
Wang, H.-Y., Qin, Z.-D., Li, B.-Y., Cong, J., Wang, Z.-J., Du, M.: Novel load shedding approach for real-time data stream processing. Journal of Chinese Computer Systems, 1–4 (2010)
Wei, M., et al.: Achieving high output quality under limited resources through structure-based spilling in xml streams. PVLDB, 1267–1278 (2010)
Works, K., Rundensteiner, E.: Preferential resource allocation in stream processing systems. International Journal of Cooperative Information Systems (2014)
Works, K., Rundensteiner, E.A.: The proactive promotion engine. In: ICDE, pp. 1340–1343 (2011)
Zdonik, S.B., et al.: The aurora and medusa projects. IEEE, 3–10 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Works, K., Rundensteiner, E.A. (2014). Reliable Aggregation over Prioritized Data Streams. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV. Lecture Notes in Computer Science(), vol 8800. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45714-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-662-45714-6_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45713-9
Online ISBN: 978-3-662-45714-6
eBook Packages: Computer ScienceComputer Science (R0)