Abstract
Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, since machine translation quality is still far from satisfactory and also term distribution across languages may be dissimilar, these techniques cannot reach the performance of monolingual approaches. To overcome these limitations, we propose a novel learning model based on active learning and self-training to incorporate unlabeled data from the target language into the learning process. Further, in this model, we consider the density of unlabeled data to avoid outlier selection in active learning. The proposed model was applied to book review datasets in two different languages. Experiments showed that the proposed model could effectively reduce labeling efforts in comparison with some baseline methods.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Liu, B.: Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies, vol. 5, p. 167. Morgan & Claypool Publishers (2012)
Montoyo, A., Martínez-Barco, P., Balahur, A.: Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments. Decision Support Systems 53(4), 675–679 (2012)
Wan, X.: Bilingual co-training for sentiment classification of chinese product reviews. Comput. Linguist. 37(3), 587–616 (2011)
Mihalcea, R., Banea, C., Wiebe, J.: Learning multilingual subjective language via cross-lingual projections. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (2007)
Banea, C., Mihalcea, R., Wiebe, J.: Multilingual subjectivity: are more languages better? In: Proceedings of the 23rd International Conference on Computational Linguistics 2010, pp. 28–36. Association for Computational Linguistics, Beijing (2010)
Banea, C., et al.: Multilingual subjectivity analysis using machine translation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing 2008, pp. 127–135. Association for Computational Linguistics, Honolulu (2008)
Pan, J., Xue, G.-R., Yu, Y., Wang, Y.: Cross-Lingual Sentiment Classification via Bi-view Non-negative Matrix Tri-Factorization. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 289–300. Springer, Heidelberg (2011)
Moh, T.-S., Zhang, Z.: Cross-lingual text classification with model translation and document translation. In: Proceedings of the 50th Annual Southeast Regional Conference 2012, pp. 71–76. ACM, Tuscaloosa (2012)
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1994, pp. 3–12. Springer-Verlag New York, Inc., Dublin (1994)
Zhu, J., Ma, M.: Uncertainty-based active learning with instability estimation for text classification. ACM Trans. Speech Lang. 8(4), 1–21 (2012)
Tang, M., Luo, X., Roukos, S.: Active learning for statistical natural language parsing. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics 2002, pp. 120–127. Association for Computational Linguistics, Philadelphia (2002)
Jingbo, Z., et al.: Active Learning With Sampling by Uncertainty and Density for Data Annotations. IEEE Transactions on Audio, Speech, and Language Processing 18(6), 1323–1331 (2010)
Prettenhofer, P., Stein, B.: Cross-Lingual Adaptation Using Structural Correspondence Learning. ACM Trans. Intell. Syst. Technol. 3(1), 1–22 (2011)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
Brefeld, U., Scheffer, T.: Co-EM support vector learning. In: Proceedings of the Twenty-First International Conference on Machine Learning 2004, p. 16. ACM, Canada (2004)
Baram, Y., El-Yaniv, R., Luz, K.: Online Choice of Active Learning Algorithms. J. Mach. Learn. Res. 5, 255–291 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hajmohammadi, M.S., Ibrahim, R., Selamat, A. (2014). Density Based Active Self-training for Cross-Lingual Sentiment Classification. In: Jeong, H., S. Obaidat, M., Yen, N., Park, J. (eds) Advances in Computer Science and its Applications. Lecture Notes in Electrical Engineering, vol 279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41674-3_146
Download citation
DOI: https://doi.org/10.1007/978-3-642-41674-3_146
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41673-6
Online ISBN: 978-3-642-41674-3
eBook Packages: EngineeringEngineering (R0)