Abstract
Linear support vector machines (SVMs) have become one of the most prominent classification algorithms for many natural language learning problems such as sequential labeling tasks. Even though the L 2-regularized SVMs yields slightly more superior accuracy than L 1-SVM, it produces too much near but non zero feature weights. In this paper, we present a cutting-weight algorithm to guide the optimization process of L 2-SVM into sparse solution. To verify the proposed method, we conduct the experiments with three well-known sequential labeling tasks and one dependency parsing task. The result shows that our method achieved at least 400% feature parameter reduction rates in comparison to the original L 2-SVM, with almost no change in accuracy and training times. In terms of run time efficiency, our method is faster than the original L 2-regularized SVMs at least 20% in all tasks.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Empirical Methods in Natural Language Processing, pp. 1–8 (2002)
Frommer, A., Maaß, P.: Fast CG-based methods for Tikhonov-Phillips regularization. Journal of Scientific Computing 20(5), 1831–1850 (1999)
Gao, J., Andrew, G., Johnson, M., Toutanova, K.: A comparative study of parameter estimation methods for statistical natural language processing. In: 45th Annual Meeting of the Association of Computational Linguistics, pp. 824–831 (2007)
Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: 15th International Conference on Machine Learning, pp. 408–415 (2008)
Joachims, T.: Training linear SVMs in linear time. In: ACM Conference on Knowledge Discovery and Data Mining, pp. 217–226 (2006)
Keerthi, S., Sundararajan, S., Chang, K.W., Hsieh, C.J., Lin, C.J.: A sequential dual method for large scale multi-class linear SVMs. In: ACM Conference on Knowledge Discovery and Data Mining, pp. 408–416 (2008)
Keerthi, S., DeCoste, D.: A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research 6, 341–361 (2005)
Kudo, T. and Matsumoto, Y.: Chunking with support vector machines. In: North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 192-199 (2001)
Kudo, T., Matsumoto, Y.: Fast methods for kernel-based text analysis. In: The 41st Annual Meeting of the Association of Computational Linguistics, pp. 24–31 (2003)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: 8th International Conference on Machine Learning, pp. 282–289 (2001)
Mangasarian, O.L., Musicant, D.: Lagrangian support vector machines. Journal of Machine Learning Research 1, 161–177 (2001)
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Ng, H.T., Low, J.K.: Chinese part-of-speech tagging: one-at-a-time or all-at-once? Word-based or character-based? In: Empirical Methods in Natural Language Processing, pp. 277–284 (2004)
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL 2000 shared task: chunking. In: 4th Conference on Computational Natural Language Learning, pp. 127–132 (2000)
Wu, Y.C., Yang, J.C., Lee, Y.S.: An approximate approach for training polynomial kernel SVMs in linear time. In: The 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 65–68 (2007)
Wu, Y.C., Lee, Y.S., Yang, J.C.: Robust and efficient Chinese word dependency analysis with linear kernel support vector machines. In: proceedings of 22nd International Conference on Computational Linguistics Poster, pp. 135–138 (2008)
Zhang, Y., Clark, S.: Chinese segmentation with a word-based perceptron algorithm. In: 45th Annual Meeting of the Association of Computational Linguistics, pp. 840–847 (2007)
Zhao, H., Kit, C.: Incorporating global information into supervised learning for Chinese word segmentation. In: 10th Conference of the Pacific Association for Computational Linguistics, pp. 66–74 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, YC., Lee, YS., Yang, JC., Yen, SJ. (2010). A Sparse L 2-Regularized Support Vector Machines for Large-Scale Natural Language Learning. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-17187-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17186-4
Online ISBN: 978-3-642-17187-1
eBook Packages: Computer ScienceComputer Science (R0)