A Sparse L 2-Regularized Support Vector Machines for Large-Scale Natural Language Learning

Wu, Yu-Chieh; Lee, Yue-Shi; Yang, Jie-Chi; Yen, Show-Jane

doi:10.1007/978-3-642-17187-1_33

Yu-Chieh Wu²⁰,
Yue-Shi Lee²²,
Jie-Chi Yang²¹ &
…
Show-Jane Yen²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6458))

Included in the following conference series:

Asia Information Retrieval Symposium

1375 Accesses

Abstract

Linear support vector machines (SVMs) have become one of the most prominent classification algorithms for many natural language learning problems such as sequential labeling tasks. Even though the L ₂-regularized SVMs yields slightly more superior accuracy than L ₁-SVM, it produces too much near but non zero feature weights. In this paper, we present a cutting-weight algorithm to guide the optimization process of L ₂-SVM into sparse solution. To verify the proposed method, we conduct the experiments with three well-known sequential labeling tasks and one dependency parsing task. The result shows that our method achieved at least 400% feature parameter reduction rates in comparison to the original L ₂-SVM, with almost no change in accuracy and training times. In terms of run time efficiency, our method is faster than the original L ₂-regularized SVMs at least 20% in all tasks.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Sparse Lifting of Dense Vectors: A Unified Approach to Word and Sentence Representations

Stochastic Sequential Minimal Optimization for Large-Scale Linear SVM

Sparse Support Vector Machine with L _p Penalty for Feature Selection

Article 11 January 2017

Keywords

References

Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Empirical Methods in Natural Language Processing, pp. 1–8 (2002)
Google Scholar
Frommer, A., Maaß, P.: Fast CG-based methods for Tikhonov-Phillips regularization. Journal of Scientific Computing 20(5), 1831–1850 (1999)
Article MathSciNet MATH Google Scholar
Gao, J., Andrew, G., Johnson, M., Toutanova, K.: A comparative study of parameter estimation methods for statistical natural language processing. In: 45th Annual Meeting of the Association of Computational Linguistics, pp. 824–831 (2007)
Google Scholar
Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: 15th International Conference on Machine Learning, pp. 408–415 (2008)
Google Scholar
Joachims, T.: Training linear SVMs in linear time. In: ACM Conference on Knowledge Discovery and Data Mining, pp. 217–226 (2006)
Google Scholar
Keerthi, S., Sundararajan, S., Chang, K.W., Hsieh, C.J., Lin, C.J.: A sequential dual method for large scale multi-class linear SVMs. In: ACM Conference on Knowledge Discovery and Data Mining, pp. 408–416 (2008)
Google Scholar
Keerthi, S., DeCoste, D.: A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research 6, 341–361 (2005)
MathSciNet MATH Google Scholar
Kudo, T. and Matsumoto, Y.: Chunking with support vector machines. In: North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 192-199 (2001)
Google Scholar
Kudo, T., Matsumoto, Y.: Fast methods for kernel-based text analysis. In: The 41st Annual Meeting of the Association of Computational Linguistics, pp. 24–31 (2003)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: 8th International Conference on Machine Learning, pp. 282–289 (2001)
Google Scholar
Mangasarian, O.L., Musicant, D.: Lagrangian support vector machines. Journal of Machine Learning Research 1, 161–177 (2001)
MathSciNet MATH Google Scholar
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Ng, H.T., Low, J.K.: Chinese part-of-speech tagging: one-at-a-time or all-at-once? Word-based or character-based? In: Empirical Methods in Natural Language Processing, pp. 277–284 (2004)
Google Scholar
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL 2000 shared task: chunking. In: 4th Conference on Computational Natural Language Learning, pp. 127–132 (2000)
Google Scholar
Wu, Y.C., Yang, J.C., Lee, Y.S.: An approximate approach for training polynomial kernel SVMs in linear time. In: The 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 65–68 (2007)
Google Scholar
Wu, Y.C., Lee, Y.S., Yang, J.C.: Robust and efficient Chinese word dependency analysis with linear kernel support vector machines. In: proceedings of 22nd International Conference on Computational Linguistics Poster, pp. 135–138 (2008)
Google Scholar
Zhang, Y., Clark, S.: Chinese segmentation with a word-based perceptron algorithm. In: 45th Annual Meeting of the Association of Computational Linguistics, pp. 840–847 (2007)
Google Scholar
Zhao, H., Kit, C.: Incorporating global information into supervised learning for Chinese word segmentation. In: 10th Conference of the Pacific Association for Computational Linguistics, pp. 66–74 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Finance Department and School of Communication, Ming Chuan University, No. 250 Zhong Shan N. Rd., Sec. 5, Taipei, 111, Taiwan
Yu-Chieh Wu
Graduate Institute of Network Learning Technology, National Central University, No.300, Jhong-Da Rd., Jhongli City, Taoyuan County, 32001, Taiwan, R.O.C.
Jie-Chi Yang
Department of Computer Science and Information Engineering, Ming Chuan University, No.5, De-Ming Rd, Gweishan District, Taoyuan, 333, Taiwan, R.O.C.
Yue-Shi Lee & Show-Jane Yen

Authors

Yu-Chieh Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yue-Shi Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jie-Chi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Show-Jane Yen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Information Engineering, Roosevelt Road National Taiwan University, No. 1, Sec. 4, 10617, Taipei, Taiwan R.O.C.
Pu-Jen Cheng
School of Computing, National University of Singapore (NUS), Computing 1, 13 Computing Drive, 117417, Singapore
Min-Yen Kan
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong Shatin, N.T. Hong Kong, China
Wai Lam
School of Computing, Computing 1, National University of Singapore (NUS), 13 Computing Drive, 117417, Singapore
Preslav Nakov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, YC., Lee, YS., Yang, JC., Yen, SJ. (2010). A Sparse L ₂-Regularized Support Vector Machines for Large-Scale Natural Language Learning. In: Cheng, PJ., Kan, MY., Lam, W., Nakov, P. (eds) Information Retrieval Technology. AIRS 2010. Lecture Notes in Computer Science, vol 6458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17187-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-17187-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17186-4
Online ISBN: 978-3-642-17187-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Sparse L ₂-Regularized Support Vector Machines for Large-Scale Natural Language Learning

Abstract

Chapter PDF

Similar content being viewed by others

Sparse Lifting of Dense Vectors: A Unified Approach to Word and Sentence Representations

Stochastic Sequential Minimal Optimization for Large-Scale Linear SVM

Sparse Support Vector Machine with L _p Penalty for Feature Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Sparse L 2-Regularized Support Vector Machines for Large-Scale Natural Language Learning

Abstract

Chapter PDF

Similar content being viewed by others

Sparse Lifting of Dense Vectors: A Unified Approach to Word and Sentence Representations

Stochastic Sequential Minimal Optimization for Large-Scale Linear SVM

Sparse Support Vector Machine with L p Penalty for Feature Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

A Sparse L ₂-Regularized Support Vector Machines for Large-Scale Natural Language Learning

Sparse Support Vector Machine with L _p Penalty for Feature Selection