Linked Bernoulli Synopses: Sampling along Foreign Keys

Gemulla, Rainer; Rösch, Philipp; Lehner, Wolfgang

doi:10.1007/978-3-540-69497-7_4

Rainer Gemulla¹,
Philipp Rösch¹ &
Wolfgang Lehner¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5069))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1265 Accesses
12 Citations

Abstract

Random sampling is a popular technique for providing fast approximate query answers, especially in data warehouse environments. Compared to other types of synopses, random sampling bears the advantage of retaining the dataset’s dimensionality; it also associates probabilistic error bounds with the query results. Most of the available sampling techniques focus on table-level sampling, that is, they produce a sample of only a single database table. Queries that contain joins over multiple tables cannot be answered with such samples because join results on random samples are often small and skewed. On the contrary, schema-level sampling techniques by design support queries containing joins. In this paper, we introduce Linked Bernoulli Synopses, a schema-level sampling scheme based upon the well-known Join Synopses. Both schemes rely on the idea of maintaining foreign-key integrity in the synopses; they are therefore suited to process queries containing arbitrary foreign-key joins. In contrast to Join Synopses, however, Linked Bernoulli Synopses correlate the sampling processes of the different tables in the database so as to minimize the space overhead, without destroying the uniformity of the individual samples. We also discuss how to compute Linked Bernoulli Synopses which maximize the effective sampling fraction for a given memory budget. The computation of the optimum solution is often computationally prohibitive so that approximate solutions are needed. We propose a simple heuristic approach which is fast and seems to produce close-to-optimum results in practice. We conclude the paper with an evaluation of our methods on both synthetic and real-world datasets.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

CoDS: A Representative Sampling Method for Relational Databases

POLYTOPE: a flexible sampling system for answering exploratory queries

Article 15 May 2019

Optimizing Window Aggregate Functions via Random Sampling

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Olken, F.: Random Sampling from Databases. Ph.d. thesis, Lawrence Berkeley National Laboratory (1993)
Google Scholar
Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join Synopses for Approximate Query Answering. In: SIGMOD, pp. 275–286 (1999)
Google Scholar
Chaudhuri, S., Motwani, R., Narasayya, V.: On Random Sampling over Joins. In: SIGMOD, pp. 263–274 (1999)
Google Scholar
Gemulla, R., Rösch, P., Lehner, W.: Linked Bernoulli Synopses: Sampling Along Foreign Keys (Full Version). Technical report (2007), http://wwwdb.inf.tu-dresden.de/publications
Tuy, H.: Monotonic optimization: Problems and solution approaches. SIAM J. on Optimization 11(2), 464–494 (2000)
Article MATH MathSciNet Google Scholar
Chaudhuri, S., Das, G., Datar, M., Narasayya, R.M.V.R.: Overcoming Limitations of Sampling for Aggregation Queries. In: ICDE, pp. 534–544 (2001)
Google Scholar
Rösch, P., Gemulla, R., Lehner, W.: Designing Random Sample Synopses with Outliers. In: ICDE (2008)
Google Scholar
Acharya, S., Gibbons, P., Poosala, V.: Congressional Samples for Approximate Answering of Group-By Queries. In: SIGMOD, pp. 487–498 (2000)
Google Scholar
Babcock, B., Chaudhuri, S., Das, G.: Dynamic Sample Selection for Approximate Query Processing. In: SIGMOD, pp. 539–550 (2003)
Google Scholar
Spiegel, J., Polyzotis, N.: Graph-Based Synopses for Relational Selectivity Estimation. In: SIGMOD, pp. 205–216 (2006)
Google Scholar
Getoor, L., Taskar, B., Koller, D.: Selectivity Estimation using Probabilistic Models. In: SIGMOD, pp. 461–472 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Database Technology Group, Technische Universität Dresden, Germany
Rainer Gemulla, Philipp Rösch & Wolfgang Lehner

Authors

Rainer Gemulla
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Rösch
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bertram Ludäscher Nikos Mamoulis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gemulla, R., Rösch, P., Lehner, W. (2008). Linked Bernoulli Synopses: Sampling along Foreign Keys. In: Ludäscher, B., Mamoulis, N. (eds) Scientific and Statistical Database Management. SSDBM 2008. Lecture Notes in Computer Science, vol 5069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69497-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-69497-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69476-2
Online ISBN: 978-3-540-69497-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Linked Bernoulli Synopses: Sampling along Foreign Keys

Abstract

Chapter PDF

Similar content being viewed by others

CoDS: A Representative Sampling Method for Relational Databases

POLYTOPE: a flexible sampling system for answering exploratory queries

Optimizing Window Aggregate Functions via Random Sampling

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Linked Bernoulli Synopses: Sampling along Foreign Keys

Abstract

Chapter PDF

Similar content being viewed by others

CoDS: A Representative Sampling Method for Relational Databases

POLYTOPE: a flexible sampling system for answering exploratory queries

Optimizing Window Aggregate Functions via Random Sampling

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation