Abstract
A parallel randomized support vector machine (PRSVM) and a parallel randomized support vector regression (PRSVR) algorithm based on a randomized sampling technique are proposed in this paper. The proposed PRSVM and PRSVR have four major advantages over previous methods. (1) We prove that the proposed algorithms achieve an average convergence rate that is so far the fastest bounded convergence rate, among all SVM decomposition training algorithms to the best of our knowledge. The fast average convergence bound is achieved by a unique priority based sampling mechanism. (2) Unlike previous work (Provably fast training algorithm for support vector machines, 2001) the proposed algorithms work for general linear-nonseparable SVM and general non-linear SVR problems. This improvement is achieved by modeling new LP-type problems based on Karush–Kuhn–Tucker optimality conditions. (3) The proposed algorithms are the first parallel version of randomized sampling algorithms for SVM and SVR. Both the analytical convergence bound and the numerical results in a real application show that the proposed algorithm has good scalability. (4) We present demonstrations of the algorithms based on both synthetic data and data obtained from a real word application. Performance comparisons with SVMlight show that the proposed algorithms may be efficiently implemented.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Adler I, Shamir R (1993) Arandomized scheme for speeding up algorithms for linear and convex programming with high constraints-to-variable ratio. Math Program 61(1–3): 39–52
Balcazar J, Dai Y, Tanaka J, Watanabe O (2001) Provably fast training algorithm for support vector machines. In: The 1st IEEE International Conference on Data Mining (ICDM01)
Balcazar J, Dai Y, Tanaka J, Watanabe O (2002) Provably fast training algorithm for support vector machines. Technical Reports on Mathematical and Computing Sciences TR-C160, Tokyo Institute of Technology, Tokyo
Bastiere A (1998) Methods for multisensor classification of aiborne targets integrating evidence theory. Aerospace Sci Technol 2: 401–411
Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover type from cartographic variables. Comput Electronics Agric 24: 131–151
Blake CL, Merz C (1998) UCI repository of machine learning databases http://www.ics.uci.edu/~mlearn/MLRepository.html
Cesa-Bianchi N (2004) Applications of regularized least squares to classification problems. In: Lecture Notes in Artificial Intelligence. Springer, Berlin, pp 14–18
Chellappa R, Zheng Q, Burlina P, Shekhar C, Eom KB (1997) On the positioning of multisensor imagery for exploitation and target recognition. In: Proceedings of the IEEE, vol 85, pp 120–138
Clarkson KL (1988) Las vegas algorithms for linear and integer programming when the dimension is small. In: The 29th IEEE symposium on foundations of computer science (FOCS’88)
Collobert R, Bengio S (2001) Svmtorch: Support vector machines for large-scale regress problems. J Mach Lear Res 1: 276–285
Collobert R, Bengio S, Bengio Y (2001) A parallel mixture of svms for very large scale problems. In:Neural Information Processing Systems, pp 633–640
Cortes C, Vapnik V (1995) Support-vector networks. Machine Learning 20: 273–297
Duda RO, Hart PE, Stork DG (2000) Pattern Classification. Wiley
Flake GW, Lawrence S (2002) Efficient svm regression training with smo. Mach Learn 46(1–3): 271–290
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and application to boosting. J Comput Syst Sci 55(1): 119–139
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4): 367–378
Gartner B, Welzl E (2000) A simple sampling lemma: Analysis and applications in geometric optimization. In: Proceeding of the 16th annual ACM symposium on computational geometry (SCG)
Graf HP, Cosatto E, Bottou L, Dourdanovic I, Vapnik V (2005) Parallel support vector machine: The cascade svm. In: Advances in neural information processing systems
Jeon B, Landgrebe DA (1999) Decision fusion approach for multitemporal classification. IEEE Trans Geosci Remote Sens 37: 1227–1233
Joachims T (1998) Making large-scale svm learning practical. In: Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge, pp 169–184
Li S, Kwok JT-Y, Tsang IW-H, Wang Y (1998) Fusion images with different focuses using support vector machines. IEEE Trans Neural Netw 15: 1555–1560
Mangasarian O, Musicant D (1999) Successive overrelaxation for support vectormachines. IEEE Trans Neural Netw 10(5): 1032–1037
Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vectormachines. In: IEEE workshop on neural networks for signal processing. pp 276–285
Osuna E, Freund R, Girosi F (1997) Support vector machine: training and applications, Artificial Intelligence Laboratory Memo No. 1602, Massachusetts Institute of Technology Massachusetts
Platt JC (1998) Sequenctial minimal optimization: a fast algorithm for training support vector machines, Technical Report MSR-TR-98-14, Microsof Research
Pohl C, Genderen JLV (1998) Mutisensor image fusion in remote sensing: Concepts, methods and applications. Int J Remote Sens 19: 823–854
Schmidt MS (1996) Identity speaks with support vector networks. In: The 28th symposium on the interface (INTERFACE-96), Sydney
Vapnik V (1995) The Nature of Statistical Learning Theory. Springer, New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, Y., Roychowdhury, V. Parallel randomized sampling for support vector machine (SVM) and support vector regression (SVR). Knowl Inf Syst 14, 233–247 (2008). https://doi.org/10.1007/s10115-007-0082-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0082-6