Precision-time tradeoffs: A paradigm for processing statistical queries on databases

Srivastava, Jaideep; Rotem, Doron

doi:10.1007/BFb0027516

Jaideep Srivastava¹ &
Doron Rotem¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 339))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

190 Accesses

Abstract

Conventional query processing techniques are aimed at queries which access small amounts of data, and require each data item for the answer. In case the database is used for statistical analysis as well as operational purposes, for some types of queries a large part of the database may be required to compute the answer. This may lead to a data access bottleneck, caused by the excessive number of disk accesses needed to get the data into primary memory. An example is computation of statistical parameters, such as count, average, median, and standard deviation, which are useful for statistical analysis of the database. Yet another example that faces this bottleneck is the verification of the truth of a set of predicates (goals), based on the current database state, for the purposes of intelligent decision making. A solution to this problem is to maintain a set of precomputed information about the database in a view or a snapshot. Statistical queries can be processed using the view rather than the real database. A crucial issue is that the precision of the precomputed information in the view deteriorates with time, because of the dynamic nature of the underlying database. Thus the answer provided is approximate, which is acceptable under many circumstances, especially when the error is bounded. The tradeoff is that the processing of queries is made faster at the expense of the precision in the answer. The concept of precision in the context of database queries is formalized, and a data model to incorporate it is developed. Algorithms are designed to maintain materialized views of data to specified degrees of precision.

This work was done while the first author was on leave from the C.S. Division, U.C. Berkeley.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

7. References

Astrahan, M.M., “System R: A Relational Database Management System”, IBM Research Report.
Google Scholar
Blakeley, J.A., P.Larson and F.W.Tompa, “Efficiently Updating Materialized Views”, Proc. of the 1986 ACM-SIGMOD Conf. on Management of Data, Washington DC, May 1986, 61–71.
Google Scholar
Cochran, W.G., “Sampling Techniques”, John Wiley Sons, New York, USA, 1953.
Google Scholar
Feller, William, “An Introduction to Probability Theory and Its Applications”, John Wiley & Sons, Inc., New York 1968.
Google Scholar
Ghosh, S.P., “SIAM: Statistics Information Access Method,” IBM RJ 4865 (51295).
Google Scholar
Hanson, Eric N. “A Performance Analysis of View Materialization Strategies,” Proc. of the 1987 ACM-SIGMOD Intl. Conf. on the Management of Data, San Francisco, CA, May 1987.
Google Scholar
Hebrail, G., “A model for summaries for very large databases,” 3rd Workshop on Statistical & Scientific Databases, 1986.
Google Scholar
Hoel, P.G., S.C. Port and C.J. Stone, “Introduction to Probability Theory”, Houghton Mifflin Company, Boston, 1971.
Google Scholar
Hou, Wen-Chi, G. Ozsoyoglu, B.K. Taneja, “Statistical Estimators for Relational Algebra Expressions,” Deptt. of Comp. Sc., Case Western Reserve University, 1987.
Google Scholar
Koenig, S. and R. Paige, “A Transformational Framework for the Automatic Control of Derived Data,” Proc. of the VLDB Conference, 1981.
Google Scholar
Olken, F. and D.Rotem, “Simple Random Sampling from Relational Databases,” Proc. of the Conf. on VLDB, Kyoto, Japan, August, 1986.
Google Scholar
Ross, Sheldon M., “Introduction to Probability Models”, Academic Press, Inc., Orlando, Florida, 1985.
Google Scholar
Roussopoulos, N. and H.Kang, “Principles and Techniques in the Design of ADMS+/−”, Computer, December 1986.
Google Scholar
Rowe, N.C., “Rule-Based Statistical Calculation on a Database Abstract,” Rep. STAN-CS-83-975.
Google Scholar
Shoshani, A., “Statistical Databases: Characteristics, Problems, and Some Solutions.” Proc. 8th Intl. Conf. on VLDB, 1982, pp 208–222.
Google Scholar
Srivastava, J. and Doron Rotem, “Analytical Modeling of Materialized View Maintenance,” Lawrence Berkeley Laboratories Tech. Rep., 1987.
Google Scholar
Ullman, J.D., “Principles of Database Systems,” Computer Science Press, 1982.
Google Scholar
Vitter, Jefferey S., “Faster methods of Random Sampling,” CACM 27(7):703–718, July 1984.
Google Scholar
Zadeh, L.A., “Fuzzy Sets”, Information and Control 8, 1965, pp. 338–353.
Google Scholar
Zadeh, L.A., “Fuzzy Sets as a basis for a theory of possibility.” Fuzzy Sets and Systems, 1, pp. 3–28, 1978.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Research Lawrence Berkeley Laboratory, University of California, 94720, Berkeley, CA
Jaideep Srivastava & Doron Rotem

Authors

Jaideep Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Doron Rotem
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Maurizio Rafanelli John C. Klensin Per Svensson

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Srivastava, J., Rotem, D. (1989). Precision-time tradeoffs: A paradigm for processing statistical queries on databases. In: Rafanelli, M., Klensin, J.C., Svensson, P. (eds) Statistical and Scientific Database Management. SSDBM 1988. Lecture Notes in Computer Science, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027516

Download citation

DOI: https://doi.org/10.1007/BFb0027516
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-50575-4
Online ISBN: 978-3-540-46045-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics