Abstract
Web user forums are a valuable means for users to resolve specific information needs, both interactively for the participants and statically for users who search/browse over historical thread data. However, the complex structure of forum threads can make it difficult for users to extract relevant information. Information retrieval (IR) over forum threads is one important way to obtain useful information on questions asked by others. In this paper, we investigate the task of IR over web user forums by utilising the discourse structure of forum threads. Experimental results show that exploiting the characteristics of discourse structure of forum threads can benefit IR, when compared to previously-published results.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Elsas, J.: The Ancestry.com forum dataset. CMU LTI Tech. Report CMU-LTI-017 (2011), http://www.cs.cmu.edu/~jelsas/data/ancestry.com/Ancestry_TR.pdf
Kim, S.N., Wang, L., Baldwin, T.: Tagging and linking web forum posts. In: Proceedings of the 14th Conference on Computational Natural Language Learning (CoNLL 2010), Uppsala, Sweden, pp. 192–202 (2010)
Wang, L., Lui, M., Kim, S.N., Nivre, J., Baldwin, T.: Predicting thread discourse structure over technical web forums. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, pp. 13–25 (2011)
Seo, J., Croft, W.B., Smith, D.A.: Online community search using thread structure. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China, pp. 1907–1910 (2009)
Elsas, J.L., Carbonell, J.G.: It pays to be picky: An evaluation of thread retrieval in online forums. In: Proceedings of 32nd International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009), Boston, USA, pp. 714–715 (2009)
Fortuna, B., Rodrigues, E.M., Milic-Frayling, N.: Improving the classification of newsgroup messages through social network analysis. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM 2007), Lisbon, Portugal, pp. 877–880 (2007)
Xi, W., Lind, J., Brill, E.: Learning effective ranking functions for newsgroup search. In: Proceedings of 27th International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), Sheffield, UK, pp. 394–401 (2004)
Kim, S.N., Cavedon, L., Baldwin, T.: Classifying dialogue acts in one-on-one live chats. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), Boston, USA, pp. 862–871 (2010)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, Williamstown, USA, pp. 282–289 (2001)
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., Marsi, E.: MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13(02), 95–135 (2007)
Shrestha, L., McKeown, K.: Detection of question-answer pairs in email conversations. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland, pp. 889–895 (2004)
Cong, G., Wang, L., Lin, C.Y., Song, Y.I., Sun, Y.: Finding question-answer pairs from online forums. In: Proceedings of 31st International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, pp. 467–474 (2008)
Wang, Y.C., Rosé, C.P.: Making conversational structure explicit: identification of initiation-response pairs within online discussions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), pp. 673–676 (2010)
Lampert, A., Dale, R., Paris, C.: Detecting emails containing requests for action. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, California, pp. 984–992 (2010)
Wang, L., Kim, S.N., Baldwin, T.: The utility of discourse structure in identifying resolved threads in technical user forums. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 2739–2756 (2012)
Sondhi, P., Gupta, M., Zhai, C., Hockenmaier, J.: Shallow information extraction from medical forum data. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Posters Volume, Beijing, China, pp. 1158–1166 (2010)
Muthmann, K., Barczyński, W.M., Brauer, F., Löser, A.: Near-duplicate detection for web-forums. In: Proceedings of the 2009 International Database Engineering & Applications Symposium (IDEAS 2009), Cetraro, Italy, pp. 142–151 (2009)
Kim, J., Chern, G., Feng, D., Shaw, E., Hovy, E.: Mining and assessing discussions on the web through speech act analysis. In: Proceedings of the ISWC 2006 Workshop on Web Content Mining with Human Language Technologies, Athens, USA (2006)
Seo, J., Croft, W.B., Smith, D.A.: Online community search using conversational structures. Information Retrieval 14(6), 547–571 (2011)
Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there: Preference judgments for relevance. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008)
Baldwin, T., Martinez, D., Penman, R.B.: Automatic thread classification for Linux user forum information access. In: Proceedings of the 12th Australasian Document Computing Symposium (ADCS 2007), Melbourne, Australia, pp. 72–79 (2007)
Carterette, B., Bennett, P.N.: Evaluation measures for preference judgments. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, pp. 685–686 (2008)
Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: SIGIR 2005, Salvador, Brazil, pp. 472–479 (2005)
Kübler, S., McDonald, R., Nivre, J.: Dependency parsing. Synthesis Lectures on Human Language Technologies 2(1), 1–127 (2009)
Bottou, L.: CRFSGD software (2011), http://leon.bottou.org/projects/sgd
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Kim, S.N., Baldwin, T. (2013). The Utility of Discourse Structure in Forum Thread Retrieval. In: Banchs, R.E., Silvestri, F., Liu, TY., Zhang, M., Gao, S., Lang, J. (eds) Information Retrieval Technology. AIRS 2013. Lecture Notes in Computer Science, vol 8281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45068-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-45068-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45067-9
Online ISBN: 978-3-642-45068-6
eBook Packages: Computer ScienceComputer Science (R0)