Multi-grade Deep Learning

被引:0
|
作者
Xu, Yuesheng [1 ]
机构
[1] Old Dominion Univ, Dept Math & Stat, Norfolk, VA 23529 USA
基金
美国国家科学基金会;
关键词
Deep learning; Deep neural network (DDN); Multi-grade deep learning (MGDL); EMPIRICAL MODE DECOMPOSITION; ONLINE GRADIENT-METHOD; DETERMINISTIC CONVERGENCE; NETWORK;
D O I
10.1007/s42967-024-00474-y
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Deep learning requires solving a nonconvex optimization problem of a large size to learn a deep neural network (DNN). The current deep learning model is of a single-grade, that is, it trains a DNN end-to-end, by solving a single nonconvex optimization problem. When the layer number of the neural network is large, it is computationally challenging to carry out such a task efficiently. The complexity of the task comes from learning all weight matrices and bias vectors from one single nonconvex optimization problem of a large size. Inspired by the human education process which arranges learning in grades, we propose a multi-grade learning model: instead of solving one single optimization problem of a large size, we successively solve a number of optimization problems of small sizes, which are organized in grades, to learn a shallow neural network (a network having a few hidden layers) for each grade. Specifically, the current grade is to learn the leftover from the previous grade. In each of the grades, we learn a shallow neural network stacked on the top of the neural network, learned in the previous grades, whose parameters remain unchanged in training of the current and future grades. By dividing the task of learning a DDN into learning several shallow neural networks, one can alleviate the severity of the nonconvexity of the original optimization problem of a large size. When all grades of the learning are completed, the final neural network learned is a stair-shape neural network, which is the superposition of networks learned from all grades. Such a model enables us to learn a DDN much more effectively and efficiently. Moreover, multi-grade learning naturally leads to adaptive learning. We prove that in the context of function approximation if the neural network generated by a new grade is nontrivial, the optimal error of a new grade is strictly reduced from the optimal error of the previous grade. Furthermore, we provide numerical examples which confirm that the proposed multi-grade model outperforms significantly the standard single-grade model and is much more robust to noise than the single-grade model. They include three proof-of-concept examples, classification on two benchmark data sets MNIST and Fashion MNIST with two noise rates, which is to find classifiers, functions of 784 dimensions, and as well as numerical solutions of the one-dimensional Helmholtz equation.
引用
收藏
页数:52
相关论文
共 50 条
  • [31] SEQUENTIAL CHAIN OF CONTIGUOUS EVALUATIONS IN THE ACCEPTANCE OF MULTI-GRADE PRODUCTS
    DUBROV, AM
    INDUSTRIAL LABORATORY, 1979, 45 (10): : 1148 - 1153
  • [32] READING SPACES IN RURAL SCHOOLS: a study in multi-grade classes
    de Lima Winchuar, Marcio Jose
    Bufrem, Leilah Santiago
    PERIFERIA, 2021, 13 (01) : 217 - 242
  • [33] Multi-grade teaching practices in Austrian and Finnish primary schools
    Hyry-Beihammer, Eeva Kaisa
    Hascher, Tina
    INTERNATIONAL JOURNAL OF EDUCATIONAL RESEARCH, 2015, 74 : 104 - 113
  • [34] Improved EfficientNet Architecture for Multi-Grade Brain Tumor Detection
    Ishaq, Ahmad
    Ullah, Fath U. Min
    Hamandawana, Prince
    Cho, Da-Jung
    Chung, Tae-Sun
    ELECTRONICS, 2025, 14 (04):
  • [35] Pedagogical perspectives for teaching in multi-grade classes in Ilha Grande
    Aparecida Alves, Maria
    DIALOGIA, 2020, (34): : 82 - 94
  • [36] Bidirectional evolutionary structural optimization algorithm for multi-grade materials
    Zhang, Huzhi
    Huang, Yaosen
    Li, Yonggui
    Yin, Bin
    Journal of Railway Science and Engineering, 2022, 19 (06): : 1726 - 1733
  • [37] Fostering Communication through Blogs in an International, Multi-grade Context
    Castellanos, Andrea
    HOW-A COLOMBIAN JOURNAL FOR TEACHERS OF ENGLISH, 2009, 16 (01): : 151 - 165
  • [38] Facilitating differentiated instruction in a multi-grade setting: the case of a small school
    Mariyam Shareefa
    Visal Moosa
    Rohani Matzin
    Nor Zaiham Midwati Abdulla
    Rosmawijah Jawawi
    SN Social Sciences, 1 (5):
  • [39] An approach for evaluation of process sustainability using multi-grade fuzzy method
    Vimal, K. E. K.
    Vinodh, S.
    Muralidharan, R.
    INTERNATIONAL JOURNAL OF SUSTAINABLE ENGINEERING, 2015, 8 (01) : 40 - 54
  • [40] Soft sensor development based on just-in-time learning and dynamic time warping for multi-grade processes
    Song, Min Jun
    Ju, Sung Hyun
    Lee, Jong Min
    KOREAN JOURNAL OF CHEMICAL ENGINEERING, 2023, 40 (05) : 1023 - 1036