Abstract
We consider the problem of minimizing the weighted sum of a smooth function f and a convex function P of n real variables subject to m linear equality constraints. We propose a block-coordinate gradient descent method for solving this problem, with the coordinate block chosen by a Gauss-Southwell-q rule based on sufficient predicted descent. We establish global convergence to first-order stationarity for this method and, under a local error bound assumption, linear rate of convergence. If f is convex with Lipschitz continuous gradient, then the method terminates in O(n 2/ε) iterations with an ε-optimal solution. If P is separable, then the Gauss-Southwell-q rule is implementable in O(n) operations when m=1 and in O(n 2) operations when m>1. In the special case of support vector machines training, for which f is convex quadratic, P is separable, and m=1, this complexity bound is comparable to the best known bound for decomposition methods. If f is convex, then, by gradually reducing the weight on P to zero, the method can be adapted to solve the bilevel problem of minimizing P over the set of minima of f+δ X , where X denotes the closure of the feasible set. This has application in the least 1-norm solution of maximum-likelihood estimation.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Fukushima, M.: A successive quadratic programming method for a class of constrained nonsmooth optimization problems. Math. Program. 49, 231–251 (1990–1991)
Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12, 989–1000 (1981)
Kiwiel, K.C.: A method for minimizing the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 48, 437–449 (1986)
Mine, H., Fukushima, M.: A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 33, 9–23 (1981)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. B 117, 387–423 (2009)
Rockafellar, R.T.: Network Flows and Monotropic Optimization. Wiley, New York (1984). Republished by Athena Scientific, Belmont (1998)
Bertsekas, D.P.: Nonlinear Programming. 2nd edn. Athena Scientific, Belmont (1999)
Fletcher, R.: Practical Methods of Optimization. 2nd edn. Wiley, New York (1987)
Gill, P.E., Murray, W., Wright, M.H.: Practical Optimization. Academic Press, New York (1981)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (1999)
Candés, E., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52, 489–509 (2006)
Chen, S., Donoho, D., Saunders, M.: Atomic decomposition by basis pursuit. SIAM Rev. 43, 129–159 (2001)
Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)
Friedlander, M.P., Tseng, P.: Exact regularization of convex programs. SIAM J. Optim. 18, 1326–1350 (2007)
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007)
Sardy, S., Tseng, P.: AMlet, RAMlet, and GAMlet: automatic nonlinear fitting of additive models, robust and generalized, with wavelets. J. Comput. Graph. Stat. 13, 283–309 (2004)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Tseng, P.: Convergence of block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 473–492 (2001)
Joachims, T.: Making large-scale SVM learning practical. In: Schólkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 169–184. MIT Press, Cambridge (1999)
Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. In: Schólkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Censor, Y., Zenios, S.A.: Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, London (1997)
Chen, P.-H., Fan, R.-E., Lin, C.-J.: A study on SMO-type decomposition methods for support vector machines. IEEE Trans. Neural Netw. 17, 893–908 (2006)
Grippo, L., Sciandrone, M.: On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26, 127–136 (2000)
Lin, C.-J., Lucidi, S., Palagi, L., Risi, A., Sciandrone, M.: A decomposition algorithm model for singly linearly constrained problems subject to lower and upper bounds. Technical Report, DIS-Università di Roma “La Sapienza”, Rome (2007). J. Optim. Theory Appl. (to appear)
List, N., Simon, H.U.: General Polynomial Time Decomposition Algorithms. Lecture Notes in Computer Science, vol. 3559, pp. 308–322. Springer, Berlin (2005)
Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30, 408–425 (1992)
Mangasarian, O.L., Musicant, D.R.: Successive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10, 1032–1037 (1999)
Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Report, Department of Mathematics, University of Washington, Seattle (2007). Comput. Optim. Appl. (to appear)
Meier, L., van de Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. R. Stat. Soc. B 70, 53–71 (2008)
Rockafellar, R.T.: The elementary vectors of a subspace of R N. In: Bose, R.C., Dowling, T.A. (eds.) Combinatorial Mathematics and its Applications. Proc. of the Chapel Hill Conference 1967, pp. 104–127. Univ. North Carolina Press, Chapel Hill (1969)
Hush, D., Scovel, C.: Polynomial-time decomposition algorithms for support vector machines. Mach. Learn. 51, 51–71 (2003)
Hush, D., Kelly, P., Scovel, C., Steinwart, I.: QP algorithms with guaranteed accuracy and run time for support vector machines. J. Mach. Learn. Res. 7, 733–769 (2006)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, New York (1998)
Luo, Z.Q., Tseng, P.: On the convergence rate of dual ascent methods for linearly constrained convex minimization. Math. Oper. Res. 18, 846–867 (1993)
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. SIAM, Philadelphia (2000)
Berman, P., Kovoor, N., Pardalos, P.M.: Algorithms for the least distance problem. In: Pardalos, P.M. (ed.) Complexity in Numerical Optimization, pp. 33–56. World Scientific, Singapore (1993)
Megiddo, N., Tamir, A.: Linear time algorithms for some separable quadratic programming problems. Oper. Res. Lett. 13, 203–211 (1993)
Brucker, P.: An O(n) algorithm for quadratic knapsack problems. Oper. Res. Lett. 3, 163–166 (1984)
Kiwiel, K.C.: On linear time algorithms for the continuous quadratic knapsack problem. J. Optim. Theory Appl. 134, 549–554 (2007)
Tseng, P.: An ε-out-of-kilter method for monotropic programming problems. Math. Oper. Res. 26, 221–233 (2001)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by A. Miele.
This research was supported by the National Science Foundation, Grant No. DMS-0511283.
Rights and permissions
About this article
Cite this article
Tseng, P., Yun, S. Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization. J Optim Theory Appl 140, 513–535 (2009). https://doi.org/10.1007/s10957-008-9458-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-008-9458-3