Abstract
With the rising and spreading of micro-blog, the sentiment classification of short texts has become a research hotspot. Some methods have been developed in the past decade. However, since the Chinese and English are different in language syntax, semantics and pragmatics, sentiment classification methods that are effective for English twitter may fail on Chinese micro-blog. In addition, the colloquialism and conciseness of short Chinese texts introduces additional challenges to sentiment classification. In this work, a novel hybrid learning model was proposed for sentiment classification of Chinese micro-blogs, which included two stages. In the first stage, emotional scores were calculated over the whole dataset by utilizing an improved Chinese-oriented sentiment dictionary classification method. Data with extremely high or low scores were directly labeled. In the second stage, the remaining data were labeled by using an integrated classification method based on sentiment dictionary, support vector machine (SVM) and k-nearest neighbor (KNN). An improved feature selection method was adopted to enhance the discriminative power of the selected features. The two-stage hybrid framework made the proposed method effective for sentiment classification of Chinese micro-blogs. Experiments on the COAE2014 (Chinese Opinion Analysis Evaluation 2014) dataset show that the proposed method outperforms other schemes.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
CNNIC. Statistical reports from CNNIC [EB/OL]. [2015-12-30]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/.
ZHAO Yan-yan, QIN Bing, LIU Ting. Sentiment analysis [J]. Journal of Software, 2010, 21(8): 1834-1848. (in Chinese)
XIE Li-xing, ZHOU Ming, SUN Mao-song. Hierarchical structure based hybrid approach to sentiment analysis of chinese micro blog and its feature extraction [J]. Journal of Chinese Information Processing, 2012, 26(1): 73-83. (in Chinese)
BAKLIWAL A, FOSTER J, PUIL J, O’BRIEN R, TOUNSI L, HUGHES M. Sentiment analysis of political tweets: Towards an accurate classifier [C]// Proceedings of the Workshop on Language in Social Media (LASM 2013). Atlanta: ACL, 2013: 49-58.
BARBOSA L, FENG J. Robust sentiment detection on twitter from biased and noisy data [C]// International Conference on Computational Linguistics (ICCL 2010). Beijing: CIPS, 2010: 36-44.
KIM SM, HOVY E. Automatic detection of opinion bearing words and sentences [C]// International Joint Conference on Natural Language Processing (IJCNLP 2005). Jeju Island: Springer, 2005: 61-66.
LI Shou-shan, SOPHIA Y, HUANG Chu-ren, SU Yan. Construction of Chinese sentiment lexicon using bilingual information and label propagation algorithm [J]. Journal of Chinese Information Processing, 2013, 27(6): 75-81. (in Chinese)
HAN Zhong-ming, ZHANG Yu-sha, ZHANG Hui, WAN Yue-liang, HUANG Jin-hui. On effective short text tendency classification algorithm for chinese micro blogging [J]. Computer Applications and Software, 2012, 29(10): 89-93. (in Chinese)
PANG Zhen-jun, GAO Li-bo, YAO Tian-fang. Web text tendency classification based on sentiment phrase [C]// Chinese Opinion Analysis Evaluation (COAE 2014). Kunming: CIPS, 2014: 179-186. (in Chinese)
SUN Song-tao, HE Yan-xiang, CAI Rui, LI Fei, HE Fei-yan. LEO_WHU’s report on COAE2014 [C]// Chinese Opinion Analysis Evaluation (COAE 2014). Kunming: CIPS, 2014: 27-34. (in Chinese)
LUO Yi, LI Li, TAN Song-bo, CHEN Xue-qi. Sentiment analysis on Chinese micro-Blog corpus [C]// Chinese Opinion Analysis Evaluation (COAE 2014). Kunming: CIPS, 2014: 123-130. (in Chinese)
PANG Bo, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques [C]// Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). Philadelphia, PA: ACM, 2002: 79-86.
SUN Yan, ZHOU Xue-guang, FU Wei. Unsupervised topic and sentiment unification model for sentiment analysis [J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2013, 49(1): 102-108. (in Chinese)
TAN Chen-hao, LEE L, TANG Jie, JIANG Long, ZHOU Ming, LI Ping. User-level sentiment analysis incorporating social networks [C]// International Conference on Knowledge Discovery and Data Mining (KDD 2011). San Diego, CA: ACM, 2011: 1397-1405.
SOCHER R, PENNINGTON J, HUANG E H, NG A Y, MANNING C D. Semi-supervised recursive auto-encoders for predicting sentiment distributions [C]// Conference on Empirical Methods in Natural Language Processing (EMNLP 2011). Edinburgh, UK: ACM, 2011: 151-161.
LIU Zhi-guang, DONG Xi-shuang, GUAN Yi, YANG Jin-feng. Reserved self-training: a semi-supervised sentiment classification method for Chinese Microblogs [C]// International Joint Conference on Natural Language Processing (IJCNLP 2013). Nagoya, Japan: ACL, 2013: 455-462.
PARK A, PAROUBEK P. Twitter as a corpus for sentiment analysis and opinion mining [C]// International Conference on Language Resources and Evaluation (LREC 2010). Valletta, Malta: DBLP, 2010: 1320-1326.
DAVIDOV D, TSUR O, RAPPOPORT A. Enhanced sentiment learning using twitter hashtags and smileys [C]// International Conference on Computational Linguistics (ICCL 2010). Beijing: ACM, 2010: 241-249.
RUSTAMOV S, CLEMENTS M A. Sentence-level subjectivity detection using neuro-fuzzy models [C]// Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2013). Atlanta: ACL, 2013: 108-114.
REN Yong, KAJI N, YOSHINAGA N, KITSUREGAWA M. Sentiment classification in under-resourced languages using graph-based semi-supervised learning methods [J]. IEICE Transactions on Information and Systems, 2014, 97(4): 790-797.
MAO Xia, JIANG Lin, XUE Yu-li. Affect computation of chinese short text [J]. IEICE Transactions on Information and Systems, 2012, 95(11): 2741-2744.
BAIKE. The definition of Chinese micro-blog [EB/OL]. [2015-12-30]. http://www.baike.com/wiki/%E5%BE%AE%E5% 8D%9A.
HOWNET. The latest hownet news [EB/OL]. [2015-12-30]. http://www.keenage.com/html/e_index.html.
SHUJUTANG. NTUSD released by the National Taiwan University [EB/OL]. [2015-12-30]. http://www.datatang.com/data/11837.
NLPIR. ICTCLAS 2015 [EB/OL]. [2015-12-30]. http:// ictclas.nlpir.org/.
LIBSVM. A library for support vector machines [EB/OL]. [2015-12-30]. http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
TAN Song-bo, WANG Su-ge, LIAO Xiang-wen, LIU Kang. Fifth Chinese opinion analysis evaluation report [C]// Chinese Opinion Analysis Evaluation (COAE 2013). Shanxi: CIPS, 2013: 5-33. (in Chinese)
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Projects(61573380, 61303185) supported by the National Natural Science Foundation of China; Project(13BTQ052) supported by the National Social Science Foundation of China; Project(2016M592450) supported by the China Postdoctoral Science Foundation; Project(2016JJ4119) supported by the Hunan Provincial Natural Science Foundation of China.
Rights and permissions
About this article
Cite this article
Li, Ff., Wang, Ht., Zhao, Rc. et al. Chinese micro-blog sentiment classification through a novel hybrid learning model. J. Cent. South Univ. 24, 2322–2330 (2017). https://doi.org/10.1007/s11771-017-3644-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-017-3644-0