1. 西安工业大学 基础学院,陕西 西安 710021
2. 西安电子科技大学 数学与统计学院,陕西 西安 710071
3. 西安工业大学 计算机科学与工程学院,陕西 西安 710021
[ "张文娟(1980—),女,副教授,E-mail:[email protected]" ]
[ "冯象初(1962—),男,教授,E-mail:[email protected]" ]
肖 锋(1976—),男,教授,E-mail:[email protected]
[ "黄姝娟(1974—),女,副教授,E-mail:[email protected]" ]
[ "李 欢(1998—),女,西安工业大学硕士研究生,E-mail:[email protected]" ]
纸质出版日期:2024-06-20,
网络出版日期:2024-03-26,
收稿日期:2023-01-03,
扫 描 看 全 文
张文娟, 冯象初, 肖锋, 等. 求解一类非光滑凸优化问题的相对加速SGD算法[J]. 西安电子科技大学学报, 2024,51(3):147-157.
Wenjuan ZHANG, Xiangchu FENG, Feng XIAO, et al. Relatively accelerated stochastic gradient algorithm for a class of non-smooth convex optimization problem[J]. Journal of Xidian University, 2024,51(3):147-157.
张文娟, 冯象初, 肖锋, 等. 求解一类非光滑凸优化问题的相对加速SGD算法[J]. 西安电子科技大学学报, 2024,51(3):147-157. DOI: 10.19665/j.issn1001-2400.20240301.
Wenjuan ZHANG, Xiangchu FENG, Feng XIAO, et al. Relatively accelerated stochastic gradient algorithm for a class of non-smooth convex optimization problem[J]. Journal of Xidian University, 2024,51(3):147-157. DOI: 10.19665/j.issn1001-2400.20240301.
一阶优化算法由于其计算简单、代价小
被广泛应用于机器学习、大数据科学、计算机视觉等领域
然而
现有的一阶算法大多要求目标函数具有Lipschitz连续梯度
而实际中的很多应用问题不满足该要求。在经典的梯度下降算法基础上
引入随机和加速
提出一种相对加速随机梯度下降算法。该算法不要求目标函数具有Lipschitz连续梯度
而是通过将欧氏距离推广为Bregman距离
从而将Lipschitz连续梯度条件减弱为相对光滑性条件。相对加速随机梯度下降算法的收敛性与一致三角尺度指数有关
为避免调节最优一致三角尺度指数参数的工作量
给出一种自适应相对加速随机梯度下降算法。该算法可自适应地选取一致三角尺度指数参数。对算法收敛性的理论分析表明
算法迭代序列的目标函数值收敛于最优目标函数值。针对Possion反问题和目标函数的Hessian阵算子范数随变量范数多项式增长的极小化问题的数值实验表明
自适应相对加速随机梯度下降算法和相对加速随机梯度下降算法的收敛性能优于相对随机梯度下降算法。
The first order method is widely used in the fields such as machine learning
big data science
computer vision
etc.A crucial and standard assumption for almost all first order methods is that the gradient of the objective function has to be globally Lipschitz continuous
which
however
can’t be satisfied by a lot of practical problems.By introducing stochasticity and acceleration to the vanilla GD (Gradient Descent) algorithm
a RASGD (Relatively Accelerated Stochastic Gradient Descent) algorithm is developed
and a wild relatively smooth condition rather than the gradient Lipschitz is needed to be satisfied by the objective function.The convergence of the RASGD is related to the UTSE (Uniformly Triangle Scaling Exponent).To avoid the cost of tuning this parameter
a ARASGD(Adaptively Relatively Accelerated Stochastic Gradient Descent)algorithm is further proposed.The theoretical convergence analysis shows that the objective function values of the iterates converge to the optimal value.Numerical experiments are conducted on the Poisson inverse problem and the minimization problem with the operator norm of Hessian of the objective function growing as a polynomial in variable norm
and the results show that the convergence performance of the ARASGD method and RASGD method is better than that of the RSGD method.
凸优化非光滑优化相对光滑随机规划梯度方法加速随机梯度下降
convex optimizationnonsmooth optimizationrelatively smoothstochastic programminggradient methodaccelerated stochastic gradient descent
王勇, 王喜媛, 任泽洋. 毫米波MIMO的DNN混合预编码梯度优化方法[J]. 西安电子科技大学学报, 2022, 49(1):202-207.
WANG Yong, WANG Xiyuan, REN Zeyang. Algorithm for Gradient Optimization of Hybrid Precoding Based on DNN in the Millimeter Wave MIMO System[J]. Journal of Xidian University, 2022, 49(1):202-207.
LI J, XIAO M, FENG C, et al. Training Neural Networks by Lifted Proximal Operator Machines[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6):3334-3348.
POLYAK B T. Introduction to Optimization[M]. New York: Optimization Software,1987.
NESTEROV Y. Introductory Lectures on Convex Optimization:a Basic Course[M]. Boston: Kluwer Academic Publishers, 2004.
BIRNBAUM B, DEVANUR N R D, XIAO L. Distributed Algorithms via Gradient Descent for Fisher Markets[C]//Proceedings of the 12th ACM Conference on Electronic Commerce. New York: ACM, 2011:127-136.
BAUSCHKE H H, BOLTE J, TEBOULLE M. A Descent Lemma Beyond Lipschitz Gradient Continuity:First-Order Methods Revisited and Applications[J]. Mathematics of Operations Research, 2017, 42(2):330-348.
LU H, FREUD R M, NESTEROV Y. Relatively-Smooth Convex Optimization by First Order Methods,and Applications[J]. SIAM Journal on Optimization, 2018, 28(1):333-354.
HANZELY F, RICHTARIK P, XIAO L. Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization(2020)[R/OL].[2020-01-01].http://10.48550/arXiv.1808.03045.http://10.48550/arXiv.1808.03045http://10.48550/arXiv.1808.03045
NESTEROV Y. Implementable Tensor Methods in Unconstrained Convex Optimization[J]. Mathematical Programming, 2021, 186:157-183.
ZHOU Y, LIANG Y, SHEN L. A Simple Convergence Analysis of Bregman Proximal Gradient Algorithm[J]. Computational Optimization and Applications, 2019, 93:903-912.
LI H, LIN Z, FANG Y. Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization[J]. Journal of Machine Learning Research, 2022, 23(1):10057-10097.
ZHOU P, YUAN X, LIN Z,et.al. A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10):5933-5946.
SCHMIDT M, ROUX L N, BACH F. Minimizing Finite Sums with the Stochastic Average Gradient[J]. Mathematical Programming, 2017, 162(1-2):83-112.
JOHNSON R, ZHANG T. Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction[C]//Advances in Neural Information Processing Systems. San Diego: NEURIPS, 2013,315-323.
HANZELY F, RICHTARIKP. Fastest Rates for Stochastic Mirror Descent Methods(2018)[R/OL].[2018-01-01].https://doi.org/10.48550/arXiv.1803.07374v1https://dx.doi.org/10.48550/arXiv.1803.07374v1.
XIE X, ZHOU P, LI H,et.al. Adan:Adaptive Nesterov Momentum Algorithm for Faster Optimizing DeepModels(2023)[R/OL].[2023-01-01].https://doi.org/10.48550/arXiv.2208.06677https://dx.doi.org/10.48550/arXiv.2208.06677.
ZHUANG Z, LIU M, CUTKOSKY A,et.al. Understanding Adamw through Proximal Methods and Scale-Freeness(2022)[R/OL].[2022-08-09].https://doi.org/10.48550/arXiv.2202.00089https://dx.doi.org/10.48550/arXiv.2202.00089.
LI H, LIN Z. Restarted Nonconvex Accelerated Gradient Descent:No More Polylogarithmic Factor in the O(ε-7/4) Complexity(2022)[R/OL].[2022-01-01].https://doi.org/10.48550/arXiv.2201.11411https://dx.doi.org/10.48550/arXiv.2201.11411.
POLYAK B T. Some Methods of Speeding up the Convergence of Iteration Methods[J]. Ussr Computational Mathematics and Mathematical Physics, 1964, 4(5):1-17.
NESTEROV Y. On an Approach to the Construction of Optimal Methods of Minimization of Smooth Convex Functions[J]. Ekonomika I Mateaticheskie Metody, 1988, 24(3):509-517.
ALLEN ZZ. Katyusha:the First Truly Accelerated Stochastic Gradient Method[C]//Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. New York: ACM, 2017:1200-1206.
0
浏览量
10
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构