Abstract
Conventional query processing techniques are aimed at queries which access small amounts of data, and require each data item for the answer. In case the database is used for statistical analysis as well as operational purposes, for some types of queries a large part of the database may be required to compute the answer. This may lead to a data access bottleneck, caused by the excessive number of disk accesses needed to get the data into primary memory. An example is computation of statistical parameters, such as count, average, median, and standard deviation, which are useful for statistical analysis of the database. Yet another example that faces this bottleneck is the verification of the truth of a set of predicates (goals), based on the current database state, for the purposes of intelligent decision making. A solution to this problem is to maintain a set of precomputed information about the database in a view or a snapshot. Statistical queries can be processed using the view rather than the real database. A crucial issue is that the precision of the precomputed information in the view deteriorates with time, because of the dynamic nature of the underlying database. Thus the answer provided is approximate, which is acceptable under many circumstances, especially when the error is bounded. The tradeoff is that the processing of queries is made faster at the expense of the precision in the answer. The concept of precision in the context of database queries is formalized, and a data model to incorporate it is developed. Algorithms are designed to maintain materialized views of data to specified degrees of precision.
This work was done while the first author was on leave from the C.S. Division, U.C. Berkeley.
Preview
Unable to display preview. Download preview PDF.
7. References
Astrahan, M.M., “System R: A Relational Database Management System”, IBM Research Report.
Blakeley, J.A., P.Larson and F.W.Tompa, “Efficiently Updating Materialized Views”, Proc. of the 1986 ACM-SIGMOD Conf. on Management of Data, Washington DC, May 1986, 61–71.
Cochran, W.G., “Sampling Techniques”, John Wiley Sons, New York, USA, 1953.
Feller, William, “An Introduction to Probability Theory and Its Applications”, John Wiley & Sons, Inc., New York 1968.
Ghosh, S.P., “SIAM: Statistics Information Access Method,” IBM RJ 4865 (51295).
Hanson, Eric N. “A Performance Analysis of View Materialization Strategies,” Proc. of the 1987 ACM-SIGMOD Intl. Conf. on the Management of Data, San Francisco, CA, May 1987.
Hebrail, G., “A model for summaries for very large databases,” 3rd Workshop on Statistical & Scientific Databases, 1986.
Hoel, P.G., S.C. Port and C.J. Stone, “Introduction to Probability Theory”, Houghton Mifflin Company, Boston, 1971.
Hou, Wen-Chi, G. Ozsoyoglu, B.K. Taneja, “Statistical Estimators for Relational Algebra Expressions,” Deptt. of Comp. Sc., Case Western Reserve University, 1987.
Koenig, S. and R. Paige, “A Transformational Framework for the Automatic Control of Derived Data,” Proc. of the VLDB Conference, 1981.
Olken, F. and D.Rotem, “Simple Random Sampling from Relational Databases,” Proc. of the Conf. on VLDB, Kyoto, Japan, August, 1986.
Ross, Sheldon M., “Introduction to Probability Models”, Academic Press, Inc., Orlando, Florida, 1985.
Roussopoulos, N. and H.Kang, “Principles and Techniques in the Design of ADMS+/−”, Computer, December 1986.
Rowe, N.C., “Rule-Based Statistical Calculation on a Database Abstract,” Rep. STAN-CS-83-975.
Shoshani, A., “Statistical Databases: Characteristics, Problems, and Some Solutions.” Proc. 8th Intl. Conf. on VLDB, 1982, pp 208–222.
Srivastava, J. and Doron Rotem, “Analytical Modeling of Materialized View Maintenance,” Lawrence Berkeley Laboratories Tech. Rep., 1987.
Ullman, J.D., “Principles of Database Systems,” Computer Science Press, 1982.
Vitter, Jefferey S., “Faster methods of Random Sampling,” CACM 27(7):703–718, July 1984.
Zadeh, L.A., “Fuzzy Sets”, Information and Control 8, 1965, pp. 338–353.
Zadeh, L.A., “Fuzzy Sets as a basis for a theory of possibility.” Fuzzy Sets and Systems, 1, pp. 3–28, 1978.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1989 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Srivastava, J., Rotem, D. (1989). Precision-time tradeoffs: A paradigm for processing statistical queries on databases. In: Rafanelli, M., Klensin, J.C., Svensson, P. (eds) Statistical and Scientific Database Management. SSDBM 1988. Lecture Notes in Computer Science, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0027516
Download citation
DOI: https://doi.org/10.1007/BFb0027516
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-50575-4
Online ISBN: 978-3-540-46045-9
eBook Packages: Springer Book Archive