Abstract
Probabilistic Databases (PDBs) lie at the expressive intersection of databases, first-order logic, and probability theory. PDBs employ logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and they represent query answers via logical lineage formulas (aka.“data provenance”) to trace the dependencies between these answers and the input tuples that led to their derivation. While the literature on PDBs dates back to more than 25 years of research, only fairly recently the key role of lineage for establishing a closed and complete representation model of relational operations over this kind of probabilistic data was discovered. Although PDBs benefit from their efficient and scalable database infrastructures for data storage and indexing, they couple the data computation with probabilistic inference, the latter of which remains a #P-hard problem also in the context of PDBs.
In this chapter, we provide a review on the key concepts of PDBs with a particular focus on our own recent research results related to this field. We highlight a number of ongoing research challenges related to PDBs, and we keep referring to an information extraction (IE) scenario as a running application to manage uncertain and temporal facts obtained from IE techniques directly inside a PDB setting.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Abiteboul, S., Herr, L., den Bussche, J.V.: Temporal Connectives versus Explicit Timestamps in Temporal Query Languages. In: Clifford, J., Tuzhilin, A. (eds.) Recent Advances in Temporal Databases, Proceedings of the International Workshop on Temporal Databases, Zürich, Switzerland, September 17-18. Workshops in Computing, pp. 43–57. Springer, New York (1995)
Abiteboul, S., Hull, R., Vianu, V. (eds.): Foundations of Databases, 1st edn. Addison-Wesley, Boston (1995)
Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. SIGMOD Record 16(3), 34–48 (1987)
Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of the ACM 26(11), 832–843 (1983)
Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and Simple Relational Processing of Uncertain Data. In: Proceedings of the 24th International Conference on Data Engineering, ICDE, pp. 983–992. IEEE Computer Society, Washington, DC (2008)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Benjelloun, O., Das Sarma, A., Halevy, A., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB Journal 17(2), 243–264 (2008)
Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 891–893. ACM, New York (2005)
Brin, S.: Extracting Patterns and Relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)
Ceri, S., Gottlob, G., Tanca, L.: What You Always Wanted to Know About Datalog (And Never Dared to Ask). IEEE Transactions on Knowledge and Data Engineering 1(1), 146–166 (1989)
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)
Cui, Y., Widom, J., Wiener, J.L.: Tracing the Lineage of View Data in a Warehousing Environment. ACM Transactions on Database Systems 25(2), 179–227 (2000)
Dalvi, N., Ré, C., Suciu, D.: Probabilistic Databases: Diamonds in the Dirt. Communications of the ACM 52(7), 86–94 (2009)
Dalvi, N., Schnaitter, K., Suciu, D.: Computing Query Probability with Incidence Algebras. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 203–214. ACM, New York (2010)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB Journal 16(4), 523–544 (2007)
Dalvi, N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 293–302. ACM, New York (2007)
Dalvi, N., Suciu, D.: The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries. Journal of the ACM 59(6), 30:1–30:87 (2013)
Dickenstein, A., Emiris, I.Z.: Solving Polynomial Equations: Foundations, Algorithms, and Applications, 1st edn. Springer, Heidelberg (2010)
Dignös, A., Böhlen, M.H., Gamper, J.: Temporal alignment. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 433–444. ACM, New York (2012)
Dylla, M.: Efficient Querying and Learning in Probabilistic and Temporal Databases. Doctoral Dissertation. Saarland University (2014)
Dylla, M., Miliaraki, I., Theobald, M.: A Temporal-Probabilistic Database Model for Information Extraction. Proceedings of the VLDB Endowment 6(14), 1810–1821 (2013)
Dylla, M., Theobald, M.: Learning Tuple Probabilities in Probabilistic Databases. Research Report MPI-I-2014-5-001, Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany (January 2014)
Dylla, M., Theobald, M., Miliaraki, I.: Top-k query processing in probabilistic databases with non-materialized views. In: Proceedings of the 29th International Conference on Data Engineering, ICDE, pp. 122–133. IEEE Computer Society, Washington, DC (2013)
Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 102–113. ACM, New York (2001)
Feller, W.: An introduction to probability theory and its applications, 3rd edn. Wiley, Hoboken (1968)
Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: Proceedings of the 14th International Conference on Database Theory, ICDT, pp. 174–185. ACM, New York (2011)
Fisher, M., Gabbay, D., Vila, L.: Handbook of Temporal Reasoning in Artificial Intelligence, 1st edn. Foundations of Artificial Intelligence. Elsevier, Essex (2005)
Fuhr, N., Rölleke, T.: A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. ACM Transactions on Information Systems 15(1), 32–66 (1997)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1990)
Grädel, E., Gurevich, Y., Hirsch, C.: The Complexity of Query Reliability. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS, pp. 227–234. ACM, New York (1998)
Green, T.J., Tannen, V.: Models for Incomplete and Probabilistic Information. IEEE Data Engineering Bulletin 29(1), 17–24 (2006)
Widom, J., Garcia-Molina, H., Ullman, J.D.: Database systems: the complete book, 1st edn. Prentice-Hall, Upper Saddle River (2002)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence 194, 28–61 (2013)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A Survey of Top-k Query Processing Techniques in Relational Database Systems. ACM Computing Surveys 40(4), 11:1–11:58 (2008)
Jensen, C.S.: Temporal Database Management. PhD thesis, Aalborg University, Aalborg, Denmark (April 2000)
Jha, A., Suciu, D.: Knowledge Compilation Meets Database Theory: Compiling Queries to Decision Diagrams. Theory of Computing Systems 52(3), 403–440 (2013)
Kanagal, B., Deshpande, A.: Lineage processing over correlated probabilistic databases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 675–686. ACM, New York (2010)
Kanagal, B., Li, J., Deshpande, A.: Sensitivity analysis and explanations for robust query evaluation in probabilistic databases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 841–852. ACM, New York (2011)
Kanellakis, P.C., Kuper, G.M., Revesz, P.Z.: Constraint query languages. In: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS, pp. 299–313. ACM, New York (1990)
Khanna, S., Roy, S., Tannen, V.: Queries with Difference on Probabilistic Databases. Proceedings of the VLDB Endowment 4(11), 1051–1062 (2011)
Koch, C., Olteanu, D.: Conditioning probabilistic databases. Proceedings of the VLDB Endowment 1(1), 313–325 (2008)
Nakashole, N., Weikum, G., Suchanek, F.: PATTY: A Taxonomy of Relational Patterns with Semantic Types. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL, pp. 1135–1145. Association for Computational Linguistics, Stroudsburg (2012)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Heidelberg (2006)
Ohrstrom, P.: Temporal Logic: From Ancient Ideas to Artificial Intelligence. Springer, Heidelberg (2009)
Olteanu, D., Huang, J.: Using OBDDs for Efficient Query Evaluation on Probabilistic Databases. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 326–340. Springer, Heidelberg (2008)
Olteanu, D., Huang, J., Koch, C.: SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases. In: Proceedings of the 25th International Conference on Data Engineering, ICDE, pp. 640–651. IEEE Computer Society, Washington, DC (2009)
Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation in probabilistic databases. In: Proceedings of the 26th International Conference on Data Engineering, ICDE, pp. 145–156. IEEE Computer Society, Washington, DC (2010)
Olteanu, D., Wen, H.: Ranking Query Answers in Probabilistic Databases: Complexity and Efficient Algorithms. In: Proceedings of the 28th International Conference on Data Engineering, ICDE, pp. 282–293. IEEE Computer Society, Washington, DC (2012)
Ré, C., Dalvi, N.N., Suciu, D.: Efficient Top-k Query Evaluation on Probabilistic Data. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE, pp. 886–895. IEEE Computer Society, Washington, DC (2007)
Ré, C., Suciu, D.: Approximate Lineage for Probabilistic Databases. Proceedings of the VLDB Endowment 1(1), 797–808 (2008)
Ré, C., Suciu, D.: The trichotomy of HAVING queries on a probabilistic database. VLDB Journal 18(5), 1091–1116 (2009)
Rekatsinas, T., Deshpande, A., Getoor, L.: Local Structure and Determinism in Probabilistic Databases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 373–384. ACM, New York (2012)
Sagiv, Y., Yannakakis, M.: Equivalences Among Relational Expressions with the Union and Difference Operators. Journal of the ACM 27(4), 633–655 (1980)
Sarma, A.D., Theobald, M., Widom, J.: Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In: Proceedings of the 24th International Conference on Data Engineering, ICDE, pp. 1023–1032. IEEE Computer Society, Washington, DC (2008)
Das Sarma, A., Theobald, M., Widom, J.: LIVE: a lineage-supported versioned DBMS. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 416–433. Springer, Heidelberg (2010)
Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB Journal 18(5), 1065–1090 (2009)
Sen, P., Deshpande, A., Getoor, L.: Read-once Functions and Query Evaluation in Probabilistic Databases. In: Proceedings of the VLDB Endowment, vol. 3(1-2), pp. 1068–1079 (2010)
Shiryaev, A.N.: Probability, 2nd edn. Springer, Heidelberg (1995)
Singh, S., Mayfield, C., Mittal, S., Prabhakar, S., Hambrusch, S.E., Shah, R.: Orion 2.0: native support for uncertain data. In: SIGMOD Conference, pp. 1239–1242 (2008)
Smullyan, R.M.: First-order logic, 1st edn. Springer, Heidelberg (1968)
Stoyanovich, J., Davidson, S., Milo, T., Tannen, V.: Deriving Probabilistic Databases with Inference Ensembles. In: Proceedings of the 27th International Conference on Data Engineering, ICDE, pp. 303–314. IEEE Computer Society, Washington, DC (2011)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: Proceedings of the 16th International Conference on World Wide Web, WWW, pp. 697–706. ACM, New York (2007)
Suciu, D., Olteanu, D., Christopher, R., Koch, C.: Probabilistic Databases, 1st edn. Morgan & Claypool, San Rafael (2011)
Tuzhilin, A., Clifford, J.: A Temporal Relational Algebra As Basis for Temporal Relational Completeness. In: Proceedings of the 16th International Conference on Very Large Data Bases, VLDB, pp. 13–23. Morgan Kaufmann Publishers, San Rafael (1990)
Sperschneider, G.A.V.: Logic: A Foundation for Computer Science, 1st edn. Addison-Wesley, Boston (1991)
Valiant, L.G.: The Complexity of Computing the Permanent. Theoretical Computer Science 8(2), 189–201 (1979)
van Benthem, J.: The Logic of Time: A Model-Theoretic Investigation into the Varieties of Temporal Ontology and Temporal Discourse, 2nd edn. Kluwer Academic Publishers, Dordrecht (1991)
Weikum, G., Theobald, M.: From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 65–76. ACM, New York (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Dylla, M., Theobald, M., Miliaraki, I. (2014). Querying and Learning in Probabilistic Databases. In: Koubarakis, M., et al. Reasoning Web. Reasoning on the Web in the Big Data Era. Reasoning Web 2014. Lecture Notes in Computer Science, vol 8714. Springer, Cham. https://doi.org/10.1007/978-3-319-10587-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-10587-1_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10586-4
Online ISBN: 978-3-319-10587-1
eBook Packages: Computer ScienceComputer Science (R0)