Abstract
For traditional information filtering (IF) models, it is often assumed that the documents in one collection are only related to one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling was proposed to generate statistical models to represent multiple topics in a collection of documents, but in a topic model, topics are represented by distributions over words which are limited to distinctively represent the semantics of topics. Patterns are always thought to be more discriminative than single terms and are able to reveal the inner relations between words. This paper proposes a novel information filtering model, Significant matched Pattern-based Topic Model (SPBTM). The SPBTM represents user information needs in terms of multiple topics and each topic is represented by patterns. More importantly, the patterns are organized into groups based on their statistical and taxonomic features, from which the more representative patterns, called Significant Matched Patterns, can be identified and used to estimate the document relevance. Experiments on benchmark data sets demonstrate that the SPBTM significantly outperforms the state-of-the-art models.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Bayardo Jr., R.J.: Efficiently mining long patterns from databases. ACM Sigmod Record 27, 85–93 (1998)
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: KDD 2002, pp. 436–442. ACM (2002)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: SIGIR 2000, pp. 33–40. ACM (2000)
Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classiffication. In: ICDE 2007, pp. 716–725. IEEE (2007)
Gao, Y., Xu, Y., Li, Y.: Pattern-based topic models for information filtering. In: Proceedings of International Conference on Data Mining Workshop SENTIRE, ICDM 2013. IEEE (2013)
Gao, Y., Xu, Y., Li, Y., Liu, B.: A two-stage approach for generating topic models. In: PADKDD 2013, pp. 221–232 (2013)
Lafferty, J., Zhai, C.: Probabilistic relevance models based on document and query generation. In: Language modeling for information retrieval, pp. 1–10. Springer, Heidelberg (2003)
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49. ACM (2004)
Sparck Jones, K., Walker, S., Robertson, S.E.: A probabilistic model of infor- mation retrieval: development and comparative experiments: Part 2. Information Processing & Management 36(6), 809–840 (2000)
Steyvers, M., Griffiths, T.: Probabilistic topic models. Handbook of Latent Semantic Analysis 427(7), 424–440 (2007)
Tang, J., Wu, S., Sun, J., Su, H.: Cross-domain collaboration recommendation. In: KDD 2012, pp. 1285–1293. ACM (2012)
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: KDD 2011, pp. 448–456. ACM (2011)
Wang, X., McCallum, A., Wei, X.: Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In: ICDM 2007, pp. 697–702. IEEE (2007)
Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: SIGIR 2006, pp. 178–185. ACM (2006)
Wu, S.-T., Li, Y., Xu, Y.: Deploying approaches for pattern refinement in text mining. In: ICDM 2006, pp. 1157–1161. IEEE (2006)
Xu, Y., Li, Y., Shaw, G.: Reliable representations for association rules. Data & Knowledge Engineering 70(6), 555–575 (2011)
Yi, X., Allan, J.: A comparative study of utilizing topic models for information retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 29–41. Springer, Heidelberg (2009)
Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: SDM, vol. 2, pp. 457–473 (2002)
Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR 2002, pp. 81–88. ACM (2002)
Zhong, N., Li, Y., Wu, S.-T.: Effective pattern discovery for text mining. IEEE Transactions on Knowledge and Data Engineering 24(1), 30–44 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Gao, Y., Xu, Y., Li, Y. (2014). Topical Pattern Based Document Modelling and Relevance Ranking. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2014. WISE 2014. Lecture Notes in Computer Science, vol 8786. Springer, Cham. https://doi.org/10.1007/978-3-319-11749-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-11749-2_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11748-5
Online ISBN: 978-3-319-11749-2
eBook Packages: Computer ScienceComputer Science (R0)