Topical Pattern Based Document Modelling and Relevance Ranking

Gao, Yang; Xu, Yue; Li, Yuefeng

doi:10.1007/978-3-319-11749-2_15

Yang Gao¹⁹,
Yue Xu¹⁹ &
Yuefeng Li¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8786))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1630 Accesses
5 Citations

Abstract

For traditional information filtering (IF) models, it is often assumed that the documents in one collection are only related to one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling was proposed to generate statistical models to represent multiple topics in a collection of documents, but in a topic model, topics are represented by distributions over words which are limited to distinctively represent the semantics of topics. Patterns are always thought to be more discriminative than single terms and are able to reveal the inner relations between words. This paper proposes a novel information filtering model, Significant matched Pattern-based Topic Model (SPBTM). The SPBTM represents user information needs in terms of multiple topics and each topic is represented by patterns. More importantly, the patterns are organized into groups based on their statistical and taxonomic features, from which the more representative patterns, called Significant Matched Patterns, can be identified and used to estimate the document relevance. Experiments on benchmark data sets demonstrate that the SPBTM significantly outperforms the state-of-the-art models.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A Two-Stage Approach for Generating Topic Models

Query-Document Relevance Topic Models

Topic Modeling for Unsupervised Concept Extraction and Document Ranking

Keywords

References

Bayardo Jr., R.J.: Efficiently mining long patterns from databases. ACM Sigmod Record 27, 85–93 (1998)
Article Google Scholar
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: KDD 2002, pp. 436–442. ACM (2002)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: SIGIR 2000, pp. 33–40. ACM (2000)
Google Scholar
Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classiffication. In: ICDE 2007, pp. 716–725. IEEE (2007)
Google Scholar
Gao, Y., Xu, Y., Li, Y.: Pattern-based topic models for information filtering. In: Proceedings of International Conference on Data Mining Workshop SENTIRE, ICDM 2013. IEEE (2013)
Google Scholar
Gao, Y., Xu, Y., Li, Y., Liu, B.: A two-stage approach for generating topic models. In: PADKDD 2013, pp. 221–232 (2013)
Google Scholar
Lafferty, J., Zhai, C.: Probabilistic relevance models based on document and query generation. In: Language modeling for information retrieval, pp. 1–10. Springer, Heidelberg (2003)
Chapter Google Scholar
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49. ACM (2004)
Google Scholar
Sparck Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of infor- mation retrieval: development and comparative experiments: Part 2. Information Processing & Management 36(6), 809–840 (2000)
Article Google Scholar
Steyvers, M., Griffiths, T.: Probabilistic topic models. Handbook of Latent Semantic Analysis 427(7), 424–440 (2007)
Google Scholar
Tang, J., Wu, S., Sun, J., Su, H.: Cross-domain collaboration recommendation. In: KDD 2012, pp. 1285–1293. ACM (2012)
Google Scholar
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: KDD 2011, pp. 448–456. ACM (2011)
Google Scholar
Wang, X., McCallum, A., Wei, X.: Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In: ICDM 2007, pp. 697–702. IEEE (2007)
Google Scholar
Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: SIGIR 2006, pp. 178–185. ACM (2006)
Google Scholar
Wu, S.-T., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: ICDM 2006, pp. 1157–1161. IEEE (2006)
Google Scholar
Xu, Y., Li, Y., Shaw, G.: Reliable representations for association rules. Data & Knowledge Engineering 70(6), 555–575 (2011)
Article Google Scholar
Yi, X., Allan, J.: A comparative study of utilizing topic models for information retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 29–41. Springer, Heidelberg (2009)
Chapter Google Scholar
Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: SDM, vol. 2, pp. 457–473 (2002)
Google Scholar
Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR 2002, pp. 81–88. ACM (2002)
Google Scholar
Zhong, N., Li, Y., Wu, S.-T.: Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering 24(1), 30–44 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Science and Engineering, Queensland University of Technology, Brisbane, Australia
Yang Gao, Yue Xu & Yuefeng Li

Authors

Yang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yue Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuefeng Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of New South Wales, Sydney, Australia
Boualem Benatallah
Boston University, Boston, MA, USA
Azer Bestavros
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos & Athena Vakali &
Victoria University, Footscray, VIC, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Y., Xu, Y., Li, Y. (2014). Topical Pattern Based Document Modelling and Relevance Ranking. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-11749-2_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11748-5
Online ISBN: 978-3-319-11749-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Topical Pattern Based Document Modelling and Relevance Ranking

Abstract

Chapter PDF

Similar content being viewed by others

A Two-Stage Approach for Generating Topic Models

Query-Document Relevance Topic Models

Topic Modeling for Unsupervised Concept Extraction and Document Ranking

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Topical Pattern Based Document Modelling and Relevance Ranking

Abstract

Chapter PDF

Similar content being viewed by others

A Two-Stage Approach for Generating Topic Models

Query-Document Relevance Topic Models

Topic Modeling for Unsupervised Concept Extraction and Document Ranking

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation