Abstract
Blogging has gained popularity in recent years. Blog, a user generated content is a rich source of information and many research are conducted in finding ways to classify blogs. In this paper, we present the solution for automatic blog classification through our new framework using Wikipedia’s category system. Our framework consists of two stages: The first stage is to find the meaningful terms from blogposts to a unique concept as well as disambiguate the terms belonging to more than one concept. The second stage is to determine the categories to which these found concepts appertain. Our Wikipedia based blog classification framework categorizes blog into topic based content for blog directories to perform future browsing and retrieval. Experimental results confirm that proposed framework categorizes blogposts effectively and efficiently.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Joachims, T.: Text categorization with Support Vector Machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Ng, H.T., Goh, W.B., Low, K.L.: Feature selection, perceptron learning, and a usability case study for text categorization. In: Proc. of ACM SIGIR, pp. 67–73 (1997)
McCallum, A., Nigam, K.: A Comparison of Event Models for Naïve Bayes Text Classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997)
Yang, Y.: Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In: Proc. of ACM SIGIR, pp. 13–22 (1994)
Hotho, A., Staab, S., Stumme, G.: WordNet improves text document clustering. In: Proc. of ACM SIGIR (2003)
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics 32(1), 13–47 (2006)
Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Proc. of the MSW 2004 Workshop at the 10th ACM SIGKDD, pp. 70–87 (2004)
Jing, L., Ng, M.K., Huang, J.Z.: Knowledge-based vector space model for text clustering. KAIS 25, 35–55 (2009)
Schonhofen, P.: Identifying Document Topics Using the Wikipedia Category Network. In: Proc. of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 456–462 (2006)
Wang, P., Hu, J., Zeng, H.J., Chen, Z.: Using Wikipedia knowledge to improve text classification. KAIS 19(3), 265–281 (2009)
Syed, Z., Finin, T., Joshi, A.: Wikipedia as an Ontology for Describing Documents. In: Proc. of the AAAI International Conference on Weblogs and Social Media (2008)
Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In: AAAI (2006)
Shirakawa, M., Nakayama, K., Hara, T., Nishio, S.: Concept vector extraction from Wikipedia category network. In: Proc. of the ICUIMC, pp. 71–79 (2009)
Tahayna, B., Ayyasamy, R.K., Alhashmi, S.M., Siew, E.: A Novel Weighting Scheme for Efficient Document Indexing and Classification. In: 4th International Symposium on Information Technology, pp. 783–788 (2010)
Sun, A., Suryanto, M.A., Liu, Y.: Blog Classification Using Tags: An Empirical Study. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 307–316. Springer, Heidelberg (2007)
Ounis, I., Macdonald, C., Soboroff, I.: On the TREC BlogTrack. In: ICWSM (2008)
Mahinovs, A., Tiwari, A.: Text classification method review. Decision Engineering Report Series, pp. 1-13 (2007)
Qu, H., Pietra, A.L., Poon, S.: Automated Blog Classification: Challenges and Pitfalls. Computational Approaches to Analyzing Weblogs, pp. 184–186 (2006)
Bayoudh, I., Béchet, N., Roche, M.: Blog classification: Adding linguistic knowledge to improve the k-nn algorithm. In: Intelligent Information Processing, pp. 68–77 (2008)
Elgersma, E.: Personal vs non-personal blogs. In: Proc. of ACM SIGIR, pp. 723–724 (2008)
Salton, G., Buckley, C.: Term-weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ayyasamy, R.K., Alhashmi, S.M., Eu-Gene, S., Tahayna, B. (2011). Enhancing Automatic Blog Classification Using Concept-Category Vectorization. In: Wang, Y., Li, T. (eds) Knowledge Engineering and Management. Advances in Intelligent and Soft Computing, vol 123. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25661-5_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-25661-5_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25660-8
Online ISBN: 978-3-642-25661-5
eBook Packages: EngineeringEngineering (R0)