Particle dual averaging: optimization of mean field neural network with global convergence rate analysis*

被引:0
|
作者
Nitanda, Atsushi [1 ]
Wu, Denny [2 ]
Suzuki, Taiji [3 ]
机构
[1] Kyushu Inst Technol, RIKEN Ctr Adv Intelligence Project, Tokyo, Japan
[2] Univ Toronto, Vector Inst Artificial Intelligence, Toronto, ON, Canada
[3] Univ Tokyo, RIKEN Ctr Adv Intelligence Project, Tokyo, Japan
关键词
deep learning; machine learning; stochastic particle dynamics; LOGARITHMIC SOBOLEV INEQUALITIES;
D O I
10.1088/1742-5468/ac98a8
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
We propose the particle dual averaging (PDA) method, which generalizes the dual averaging method in convex optimization to the optimization over probability distributions with quantitative runtime guarantee. The algorithm consists of an inner loop and outer loop: the inner loop utilizes the Langevin algorithm to approximately solve for a stationary distribution, which is then optimized in the outer loop. The method can thus be interpreted as an extension of the Langevin algorithm to naturally handle nonlinear functional on the probability space. An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain. By adapting finite-dimensional convex optimization theory into the space of measures, we analyze PDA in regularized empirical/expected risk minimization, and establish quantitative global convergence in learning two-layer mean field neural networks under more general settings. Our theoretical results are supported by numerical simulations on neural networks with reasonable size.
引用
收藏
页数:51
相关论文
共 50 条
  • [21] A complete proof of global exponential convergence of a neural network for quadratic optimization with bound constraints
    Liang, XB
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2001, 12 (03): : 636 - 639
  • [22] FOREX Rate Prediction Using Chaos, Neural Network and Particle Swarm Optimization
    Pradeepkumar, Dadabada
    Ravi, Vadlamani
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2014, PT II, 2014, 8795 : 363 - 375
  • [23] Network Intrusion Detection Analysis with Neural Network and Particle Swarm Optimization Algorithm
    Tian, WenJie
    Liu, JiCheng
    2010 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-5, 2010, : 1749 - 1752
  • [24] Rate of convergence analysis of a dual fast gradient method for general convex optimization
    Patrascu, Andrei
    Necoara, Ion
    Findeisen, Rolf
    2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 3311 - 3316
  • [25] One-dimensional analysis of exponential convergence condition for dual neural network
    Zhang, Yunong
    Peng, Haifeng
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2007, 4682 : 137 - 147
  • [26] Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime
    Nishikawa, Naoki
    Suzuki, Taiji
    Nitanda, Atsushi
    Wu, Denny
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [27] Two-layer neural network on infinite-dimensional data: global optimization guarantee in the mean-field regime
    Nishikawa, Naoki
    Suzuki, Taiji
    Nitanda, Atsushi
    Wu, Denny
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2023, 2023 (11):
  • [28] Mean Field Analysis of Stochastic Neural Network Models with Synaptic Depression
    Igarashi, Yasuhiko
    Oizumi, Masafumi
    Okada, Masato
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2010, 79 (08)
  • [29] Global convergence rate analysis of unconstrained optimization methods based on probabilistic models
    C. Cartis
    K. Scheinberg
    Mathematical Programming, 2018, 169 : 337 - 375
  • [30] Global convergence rate analysis of unconstrained optimization methods based on probabilistic models
    Cartis, C.
    Scheinberg, K.
    MATHEMATICAL PROGRAMMING, 2018, 169 (02) : 337 - 375