Comparison of the efficiency of zero and first order minimization methods in neural networks
https://doi.org/10.26425/1816-4277-2022-11-48-55
Abstract
To minimize the objective function in neural networks, first-order methods are usually used, which involve the repeated calculation of the gradient. The number of variables in modern neural networks can be many thousands and even millions. Numerous experiments show that the analytical calculation time of an N variable function’s gradient is approximately N/5 times longer than the calculation time of the function itself. The article considers the possibility of using zero-order methods to minimize the function. In particular, a new zero-order method for function minimization, descent over two-dimensional spaces, is proposed. The convergence rates of three different methods are compared: standard gradient descent with automatic step selection, coordinate descent with step selection for each coordinate, and descent over two-dimensional subspaces. It has been shown that the efficiency of properly organized zero-order methods in the considered problems of training neural networks is not lower than the gradient ones.
About the Authors
E. A. GubarevaRussian Federation
Elena A. Gubareva – Cand. Sci (Phys. and Math.), Assoc. Prof. at the Department of Mathematics and Computer Science
Moscow
S. I. Khashin
Russian Federation
Sergey I. Khashin – Cand. Sci (Phys. and Math.), Assoc. Prof. at the Information Technologies and Applied Mathematics Department
Ivanovo
E. S. Shemyakova
United States
Ekaterina S. Shemyakova – Ph.D., Assoc. Prof. at the Department of Mathematics and Statistics
Toledo
References
1. Gafarov F.M., Galimyanov A.F. Artificial neural networks and applications: textbook. Allowance. Kazan: Kazan Publishing House. Un-ta; 2018. (In Russian).
2. Nikolenko S., Kadurin A., Arkhangelskaya E. Deep Learning: series “Programmer’s Library”. St. Petersburg: Peter; 2018. (In Russian).
3. Chollet F. Deep Learning with Python. 2nd Ed. Shelter Island: Manning Publications Co.; 2020. 450 р.
4. Buduma N. Fundamentals of Deep Learning. M.: Mann, Ivanov and Ferber; 2020. 306 p. (In Russian).
5. Schroff F., Kalenichenko D., Philbin J. FaceNet: A unified embedding for face recognition and clustering, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015; P. 815–823. https://doi.org/10.1109/CVPR.2015.7298682
6. Yanping Huang et al. GPipe: Efficient training of giant neural networks using pipeline parallelism. arXiv:1811.06965 [cs.CV]. https://doi.org/10.48550/arXiv.1811.06965
7. Bunday B.D. Basic optimization methods. Trans. from Eng. M.: Radio i svyaz’; 1988. 127 p. (In Russian).
8. Kochenderfer M.J., Wheeler T.A. Algorithms for Optimization. MIT Press; 2019. 520 p.
9. Avriel M. Nonlinear programming: analysis and methods. Dover Publishing; 2003. 512 p.
10. Rosenbrock H.H. An automatic method for finding the greatest or least value of a function. The Computer Journal. 1960;3:175–184.
11. Gorodeckii S.Y., Grishagin V.A. Nonlinear programming and multi-extreme optimization. N. Novgorod: N.I. Lobachevsky State University of Nizhny Novgorod Publ. House; 2007. 489 c. (In Russian).
12. Nocedal J., Wright S.J. Numerical Optimization. 2nd ed. NY: Springer New York; 2006. 651 p. https://doi.org/10.1007/978-0-387-40065-5
13. Aculich I.L. Mathematical programming in examples and tasks. Moscow: Vyshaya shkola; 1986. 352 p. (In Russian).
14. Prokopenko M.Y. Optimization methods. N. Novgorod: NNGASU; 2018. 118 p. (In Russian).
Review
For citations:
Gubareva E.A., Khashin S.I., Shemyakova E.S. Comparison of the efficiency of zero and first order minimization methods in neural networks. Vestnik Universiteta. 2022;1(11):48-55. (In Russ.) https://doi.org/10.26425/1816-4277-2022-11-48-55