Feature Selection for Language Independent Text Forum Summarization

Grozin, Vladislav A.; Gusarova, Natalia F.; Dobrenko, Natalia V.

doi:10.1007/978-3-319-24543-0_5

Vladislav A. Grozin¹²,
Natalia F. Gusarova¹² &
Natalia V. Dobrenko¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 518))

Included in the following conference series:

International Conference on Knowledge Engineering and the Semantic Web

802 Accesses
5 Citations

Abstract

Nowadays the need for multilingual information retrieval for searching relevant information is rising steadily. Specialized text-based forums on the Web are a valuable source of such information. However, extraction of informative messages is often hindered by large amount of non-informative posts (the so-called offtopic posts) and informal language commonly used on forums.

The paper deals with the task of automatic identification of posts potentially useful for sharing professional experience within text forums irrespective of the forum’s language. For our experiments we have selected subsets from various text forums containing different languages. Manual markup was held by native speaking experts. Textual, thread-based, and social graph features were extracted. In order to select satisfactory language-independent forum features we used gradient boosting models, relative influence metric for model analysis, and NDCG metric for measuring selection method quality.

We have formed a satisfactory set of forum features indicating the post’s utility which do not demand sophisticated linguistic analysis and is suitable for practical use.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Mining of Relevant and Informative Posts from Text Forums

Comparison of Text Forum Summarization Depending on Query Type for Text Forums

Supervised Automatic Text Summarization of Konkani Texts Using Linear Regression-Based Feature Weighing and Language-Independent Features

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Abbasi, A., Chen, H., Salem, A.: Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums. The University of Arizona (2007). http://ai.arizona.edu/intranet/papers/AhmedAbbasi_SentimentTOIS.pdf
Alguliev, R.M., Aliguliyev, R.M., Hajirahimova, M.S., Mehdiyev, C.A.: MCMR: Maximum coverage and minimum redundant text summarization model. Expert Systems with Applications 38, 14514–14522 (2011)
Article Google Scholar
Banea, C., Mihalcea, R., Wiebe, J.: Sense-level subjectivity in a multilingual setting. Computer Speech and Language 28, 7–19 (2014)
Article Google Scholar
Biyani, P., Bhati, S., Caragea, C., Mitra, P.: Using non-lexical features for identifying factual and opinionative threads in online forums. Knowledge-Based Systems 69, 170–178 (2014)
Article Google Scholar
Carbonaro, A.: WordNet-based Summarization to Enhance Learning Interaction Tutoring. Peer Reviewed Papers 6(2) (2010)
Google Scholar
Chen, J.-S., Hsieh, C.-L., Hsu, F.-C.: A study on Chinese word segmentation: Genetic algorithm approach. Information Management Research 2(2), 27–44 (2000)
Google Scholar
Ding, S.L., Cong, G., Lin, C.Y., Zhu, X.Y.: Using conditional random fields to extract contexts and answers of questions from online forums. In: Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics, Columbus, Ohio, pp. 710–718. ACL (2008)
Google Scholar
Freeman, L.C.: Centrality in social networks: Conceptual clarification. Social Networks 1, 215–239 (1978)
Article Google Scholar
Friedman, J.: Greedy boosting approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Article MATH Google Scholar
Garbacea, C., Tsagkias, M., de Rijke, M.: Feature Selection and Data Sampling Methods for Learning Reputation Dimensions. The University of Amsterdam at RepLab 2014 (2014). http://ceur-ws.org/Vol-1180/CLEF2014wn-Rep-GarbaceaEt2014.pdf
Generalized Boosted Regression Models. http://cran.r-project.org/web/packages/gbm/index.html
Hogenboom, A., Heerschop, B., Frasincar, F., Kaymak, U., de Jong, F.: Multi-lingual support for lexicon-based sentiment analysis guided by semantics. Decision Support Systems 62, 43–53 (2014)
Article Google Scholar
Huang, C.-C.: Automated knowledge transfer for Internet forum. Master thesis, Graduate School of Information Management, I-Shou University, Taiwan, ROC (2003)
Google Scholar
Li, Y., Liao, T., Lai, C.: A social recommender mechanism for improving knowledge sharing in online forums. Information Processing and Management 48, 978–994 (2012)
Article MATH Google Scholar
Ren, Z., Ma, J., Wang, S., Liu, Y.: Summarizing web forum threads based on a latent topic propagation process. In: CIKM 2011, October 24–28, Glasgow, Scotland, UK (2011)
Google Scholar
Jones, K.S.: Automatic summarising: the state of the art. Information Processing and Management, Special Issue on Automatic Summarising (2007)
Google Scholar
Steinberger, R.: Challenges and methods for multilingual text mining. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.167.4724
Tao, Y., Liu, S., Lin, C.: Summary of FAQs from a topical forum based on the native composition structure. Expert Systems with Applications 38, 527–535 (2011)
Article Google Scholar
Wang, B., Liu, B., Sun, C., Wang, X., Sun, L.: Thread Segmentation Based Answer Detection in Chinese Online Forums. Acta Automatica Sinica 39(1) (2013)
Google Scholar
Wang, L., Cardie, C.: Summarizing decisions in spoken meetings. In: Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages, Portland, Oregon, June 23, 2011, pp. 16–24. Association for Computational Linguistics (2011)
Google Scholar
White, D.R., Borgatti, S.P.: Betweenness centrality measures for directed graphs. Social Networks 16, 335–346 (1994)
Article Google Scholar
Yang, S.J.H., Chen, I.Y.L.: A social network-based system for supporting interactive collaboration in knowledge sharing over peer-to-peer network. International Journal of Human Computer Studies 66(1), 36–40 (2008)
Article Google Scholar
Zhou, L., Hovy, E.: Digesting virtual geek culture: the summarization of technical internet relay chats. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, Stroudsburg, PA, USA, pp. 298–305. Association for Computational Linguistics (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

National Research University of Information Technologies, Mechanics and Optics, Saint-Petersburg, 197101, Russia
Vladislav A. Grozin, Natalia F. Gusarova & Natalia V. Dobrenko

Authors

Vladislav A. Grozin
View author publications
You can also search for this author in PubMed Google Scholar
Natalia F. Gusarova
View author publications
You can also search for this author in PubMed Google Scholar
Natalia V. Dobrenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladislav A. Grozin .

Editor information

Editors and Affiliations

Complexible Inc, Washington, District of Columbia, USA
Pavel Klinov
ITMO University, St. Petersburg, Russia
Dmitry Mouromtsev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grozin, V.A., Gusarova, N.F., Dobrenko, N.V. (2015). Feature Selection for Language Independent Text Forum Summarization. In: Klinov, P., Mouromtsev, D. (eds) Knowledge Engineering and Semantic Web. KESW 2015. Communications in Computer and Information Science, vol 518. Springer, Cham. https://doi.org/10.1007/978-3-319-24543-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-24543-0_5
Published: 30 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24542-3
Online ISBN: 978-3-319-24543-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feature Selection for Language Independent Text Forum Summarization

Abstract

Chapter PDF

Similar content being viewed by others

Mining of Relevant and Informative Posts from Text Forums

Comparison of Text Forum Summarization Depending on Query Type for Text Forums

Supervised Automatic Text Summarization of Konkani Texts Using Linear Regression-Based Feature Weighing and Language-Independent Features

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Feature Selection for Language Independent Text Forum Summarization

Abstract

Chapter PDF

Similar content being viewed by others

Mining of Relevant and Informative Posts from Text Forums

Comparison of Text Forum Summarization Depending on Query Type for Text Forums

Supervised Automatic Text Summarization of Konkani Texts Using Linear Regression-Based Feature Weighing and Language-Independent Features

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation