Abstract
The amount of user generated content on the Web is growing and identifying high quality content in a timely manner has become a problem. Many forums rely on its users to manually rate content quality but this often results in gathering insufficient rating. Automated quality assessment models have largely evaluated linguistic features but these techniques are less adaptive for the diverse writing styles and terminologies used by different forum communities. Therefore, we propose a novel model that evaluates content, usage, reputation, temporal and structural features of user generated content to address these limitations. We employed a rule learner, a fuzzy classifier and Support Vector Machines to validate our model on three operational forums. Our model outperformed the existing models in our experiments and we verified that our performance improvements were statistically significant.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding High-Quality content in social media. In: Proceedings of the International Conference on Web Search and Web Data Mining (WSDM), pp. 183–194 (2008)
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 16(5), 412–424 (2000)
Chai, K.: A Machine Learning-based Approach for Automated Quality Assessment of User Generated Content in Web Forums. Ph.D. thesis, Curtin University (2011)
Chai, K., Hayati, P., Potdar, V., Wu, C., Talevski, A.: Assessing post usage for measuring the quality of forum posts. In: Proceedings of the 4th IEEE International Conference on Digital Ecosystems and Technologies, DEST (2010)
Chai, K., Potdar, V., Dillon, T.: Content Quality Assessment Related Frameworks for Social Media. In: Gervasi, O., Taniar, D., Murgante, B., Laganà, A., Mun, Y., Gavrilova, M.L. (eds.) ICCSA 2009. LNCS, vol. 5593, pp. 800–814. Springer, Heidelberg (2009)
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the 12th International Conference on Machine Learning, p. 115 (1995)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202 (1995)
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the International Joint Conference on Uncertainty in Artifical Intelligence, pp. 1022–1027 (1993)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association 32(200), 675–701 (1937)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) Explorations 11(1) (2009)
Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Tech. rep., National Taiwan University (2003), http://www.csie.ntu.edu.tw/cjlin/papers/guide/guide.pdf
Hühn, J., Hüllermeier, E.: FURIA: an algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery 19(3), 293–319 (2009)
Jeon, J., Croft, W.B., Lee, J.H., Park, S.: A framework to predict the quality of answers with Non-Textual features. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 228–235 (2006)
Lui, M., Baldwin, T.: You are what you post: User-level features in threaded discourse. In: Proceedings of the Fourteenth Australasian Document Computing Symposium (ADCS 2009), pp. 98–105 (2009)
Nemenyi, P.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University (1963)
Nussbaum, M.E., Hartley, K., Sinatra, G.M., Reynolds, R.E., Bendixe, L.D.: Enhancing the quality of On-Line discussions. In: Paper Presented at the Annual Meeting of the American Educational Research Association (2002)
Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. Advances in Kernel Methods Support Vector Learning 208(MSR-TR-98-14), 1–21 (1998)
Suryanto, M., Lim, E.P., Sun, A., Chiang, R.: Quality-Aware collaborative question answering: Methods and evaluation. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pp. 142–151 (2009)
Team, R.D.C.: R: A Language and Environment for Statistical Computing. Vienna, Austria (2011), http://www.R-project.org
Wanas, N., El-Saban, M., Ashour, H., Ammar, W.: Automatic scoring of online discussion posts. In: Proceeding of the 2nd ACM Workshop on Information Credibility on the Web, pp. 19–26 (2008)
Weimer, M., Gurevych, I.: Predicting the perceived quality of web forum posts. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (2007)
Zhu, Z., Bernhard, D., Gurevych, I.: A Multi-Dimensional model for assessing the quality of answers in social Q&A sites. Tech. rep., Ubiquitous Knowledge Processing Lab (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chai, K., Wu, C., Potdar, V., Hayati, P. (2011). Automatically Measuring the Quality of User Generated Content in Forums. In: Wang, D., Reynolds, M. (eds) AI 2011: Advances in Artificial Intelligence. AI 2011. Lecture Notes in Computer Science(), vol 7106. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25832-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-25832-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25831-2
Online ISBN: 978-3-642-25832-9
eBook Packages: Computer ScienceComputer Science (R0)