DAW: Duplicate-AWare Federated Query Processing over the Web of Data

Saleem, Muhammad; Ngonga Ngomo, Axel-Cyrille; Xavier Parreira, Josiane; Deus, Helena F.; Hauswirth, Manfred

doi:10.1007/978-3-642-41335-3_36

Muhammad Saleem²⁶,
Axel-Cyrille Ngonga Ngomo²⁶,
Josiane Xavier Parreira,
Helena F. Deus &
…
Manfred Hauswirth

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8218))

Included in the following conference series:

International Semantic Web Conference

4542 Accesses
40 Citations

Abstract

Over the last years the Web of Data has developed into a large compendium of interlinked data sets from multiple domains. Due to the decentralised architecture of this compendium, several of these datasets contain duplicated data. Yet, so far, only little attention has been paid to the effect of duplicated data on federated querying. This work presents DAW, a novel duplicate-aware approach to federated querying over the Web of Data. DAW is based on a combination of min-wise independent permutations and compact data summaries. It can be directly combined with existing federated query engines in order to achieve the same query recall values while querying fewer data sources. We extend three well-known federated query processing engines – DARQ, SPLENDID, and FedX – with DAW and compare our extensions with the original approaches. The comparison shows that DAW can greatly reduce the number of queries sent to the endpoints, while keeping high query recall values. Therefore, it can significantly improve the performance of federated query processing engines. Moreover, DAW provides a source selection mechanism that maximises the query recall, when the query processing is limited to a subset of the sources.

Download to read the full chapter text

Chapter PDF

Federated SPARQL Queries Processing with Replicated Fragments

Indexing Data on the Web: A Comparison of Schema-Level Indices for Data Search

Saving Knowledge About Sources: An Efficient Method for Querying Distributed Data

Keywords

References

Acosta, M., Vidal, M.-E., Lampo, T., Castillo, J., Ruckhaus, E.: Anapsid: an adaptive query processing engine for sparql endpoints. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 18–34. Springer, Heidelberg (2011)
Chapter Google Scholar
Auer, S., Lehmann, J., Ngonga Ngomo, A.-C.: Introduction to linked data and its lifecycle on the web. In: Rudolph, S., Gottlob, G., Horrocks, I., van Harmelen, F. (eds.) Reasoning Weg 2013. LNCS, vol. 8067, pp. 1–90. Springer, Heidelberg (2013)
Chapter Google Scholar
Basca, C., Bernstein, A.: Avalanche: putting the spirit of the web back into semantic web querying. In: SSWS, pp. 64–79 (November 2010)
Google Scholar
Bender, M., Michel, S., Triantafillou, P., Weikum, G., Zimmer, C.: Improving collection selection with overlap awareness in p2p search engines. In: SIGIR, pp. 67–74 (2005)
Google Scholar
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Article MATH Google Scholar
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. IJCSS 60, 327–336 (1998)
Google Scholar
Drukh, N., Polyzotis, N., Garofalakis, M., Matias, Y.: Fractional xsketch synopses for xml databases. In: Bellahsène, Z., Milo, T., Rys, M., Suciu, D., Unland, R. (eds.) XSym 2004. LNCS, vol. 3186, pp. 189–203. Springer, Heidelberg (2004)
Chapter Google Scholar
Görlitz, O., Staab, S.: Splendid: Sparql endpoint federation exploiting void descriptions. In: COLD, ISWC (2011)
Google Scholar
Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.-U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: WWW, pp. 411–420 (2010)
Google Scholar
Hernandez, T., Kambhampati, S.: Improving text collection selection with coverage and overlap statistics. In: WWW (Special interest tracks and posters), pp. 1128–1129 (2005)
Google Scholar
Hose, K., Schenkel, R.: Towards benefit-based rdf source selection for sparql queries. In: SWIM, p. 2 (2012)
Google Scholar
Ladwig, G., Tran, T.: Linked data query processing strategies. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 453–469. Springer, Heidelberg (2010)
Chapter Google Scholar
Langegger, A., Wöß, W., Blöchl, M.: A semantic web middleware for virtual data integration on the web. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 493–507. Springer, Heidelberg (2008)
Chapter Google Scholar
Li, Y., Heflin, J.: Using reformulation trees to optimize queries over distributed heterogeneous sources. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 502–517. Springer, Heidelberg (2010)
Chapter Google Scholar
Michel, S., Bender, M., Triantafillou, P., Weikum, G.: IQN routing: Integrating quality and novelty in P2P querying and ranking. In: Ioannidis, Y., et al. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 149–166. Springer, Heidelberg (2006)
Chapter Google Scholar
Mitzenmacher, M.: Compressed bloom filters. IEEE/ACM Trans. Netw. 10(5), 604–612 (2002)
Article Google Scholar
Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: Dbpedia sparql benchmark: performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011)
Chapter Google Scholar
Nie, Z., Kambhampati, S., Hernandez, T.: Bibfinder/statminer: Effectively mining and using coverage and overlap statistics in data integration. In: VLDB, pp. 1097–1100 (2003)
Google Scholar
Ntarmos, N., Triantafillou, P., Weikum, G.: Distributed hash sketches: Scalable, efficient, and accurate cardinality estimation for distributed multisets. ACM Trans. Comput. Syst., 27 (2009)
Google Scholar
Polyzotis, N., Garofalakis, M.: Statistical synopses for graph-structured xml databases. In: SIGMOD, pp. 358–369 (2002)
Google Scholar
Quilitz, B., Leser, U.: Querying distributed rdf data sources with sparql. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 524–538. Springer, Heidelberg (2008)
Chapter Google Scholar
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: Fedx: Optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011)
Chapter Google Scholar
Shokouhi, M., Zobel, J.: Federated text retrieval from uncooperative overlapped collections. In: SIGIR, pp. 495–502 (2007)
Google Scholar
Si, L., Callan, J.P.: Relevant document distribution estimation method for resource selection. In: SIGIR, pp. 298–305 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

IFI/AKSW, Universität Leipzig, PO 100920, D-04009, Leipzig, Germany
Muhammad Saleem & Axel-Cyrille Ngonga Ngomo

Authors

Muhammad Saleem
View author publications
You can also search for this author in PubMed Google Scholar
Axel-Cyrille Ngonga Ngomo
View author publications
You can also search for this author in PubMed Google Scholar
Josiane Xavier Parreira
View author publications
You can also search for this author in PubMed Google Scholar
Helena F. Deus
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Hauswirth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Media Institute, The Open University, Milton Keynes, UK
Harith Alani
Massachusetts Institute of Technology, Cambridge, MA, USA
Lalana Kagal
IBM Research, Hawthorne, NY, USA
Achille Fokoue
Free University Amsterdam, The Netherlands
Paul Groth
Technical University Darmstadt, Germany
Chris Biemann
Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland
Josiane Xavier Parreira
VU Amsterdam, The Netherlands
Lora Aroyo
Stanford University, CA, USA
Natasha Noy
IBM Research, Yorktown Heights, NY, USA
Chris Welty
University of California, Santa Barbara, CA, USA
Krzysztof Janowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saleem, M., Ngonga Ngomo, AC., Xavier Parreira, J., Deus, H.F., Hauswirth, M. (2013). DAW: Duplicate-AWare Federated Query Processing over the Web of Data. In: Alani, H., et al. The Semantic Web – ISWC 2013. ISWC 2013. Lecture Notes in Computer Science, vol 8218. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41335-3_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-41335-3_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41334-6
Online ISBN: 978-3-642-41335-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DAW: Duplicate-AWare Federated Query Processing over the Web of Data

Abstract

Chapter PDF

Similar content being viewed by others

Federated SPARQL Queries Processing with Replicated Fragments

Indexing Data on the Web: A Comparison of Schema-Level Indices for Data Search

Saving Knowledge About Sources: An Efficient Method for Querying Distributed Data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

DAW: Duplicate-AWare Federated Query Processing over the Web of Data

Abstract

Chapter PDF

Similar content being viewed by others

Federated SPARQL Queries Processing with Replicated Fragments

Indexing Data on the Web: A Comparison of Schema-Level Indices for Data Search

Saving Knowledge About Sources: An Efficient Method for Querying Distributed Data

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation