CMNet: a novel model and design rationale based on comparison studies and synergy of CNN and MetaFormer

被引:1
|
作者
Yu, Haowen [1 ]
Chen, Liming [2 ]
机构
[1] Univ Manchester, Fac Biol Med & Hlth, Oxford Rd, Manchester M13 9PL, England
[2] Univ Ulster, Sch Comp, Cromore Rd, Belfast BT52 1SA, North Ireland
关键词
Transformer; MetaFormer; Attention mechanism; Convolutional neural network;
D O I
10.1007/s00138-023-01446-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional- and Transformer-based backbone architecture are two dominant, widely accepted, models in computer vision. Nevertheless, it is still a challenge, thus a focus of research, to decide which backbone architecture performs better, and under which circumstances. In this paper, we conduct an in-depth investigation into the differences of the macroscopic backbone design of the CNN and Transformer models with the ultimate purpose of developing new models to combine the strengths of both types of architectures for effective image classification. Specifically, we first analyze the model structures of both models and identified four main differences, then we design four sets of ablation experiments using the ImageNet-1K dataset with an image classification problem as an example to study the impacts of these four differences on model performance. Based on the experimental results, we derive four observations as rules of thumb for designing a vision model backbone architecture. Informed by the experiment findings, we then conceive a novel model called CMNet which marries the experiment-proved best design practices of CNN and Transformer architectures. Finally, we carry out extensive experiments on CMNet using the same dataset against baseline classifiers. Initial results prove CMNet achieves the highest top-1 accuracy of 80.08% on the ImageNet-1K validation set, this is a very competitive value compared to previous classical models with similar computational complexity. Details of the implementation, algorithms and codes, are publicly available on Github: https://github.com/Arwin-Yu/CMNet.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Novel deep learning model for facial expression recognition based on maximum boosted CNN and LSTM
    Rajan, Saranya
    Chenniappan, Poongodi
    Devaraj, Somasundaram
    Madian, Nirmala
    IET IMAGE PROCESSING, 2020, 14 (07) : 1373 - 1381
  • [42] Biopharmaceutical studies of a novel sedative sublingual lozenge based on glycine and tryptophan: A rationale for mucoadhesive agent selection
    Vashchenko, O. V.
    Brodskii, R. Ye.
    Davydova, I. O.
    Vashchenko, P. V.
    Ivaniuk, O. I.
    Ruban, O. A.
    EUROPEAN JOURNAL OF PHARMACEUTICS AND BIOPHARMACEUTICS, 2024, 203
  • [43] A Novel Sequence to Sequence based CNN-LSTM Model for Long Term Load Forecasting
    Rubasinghe, Osaka
    Zhang, Xinan
    Chau, Tat Kei
    Fernando, Tyrone
    Lu, Herbert Ho Ching
    2022 IEEE SUSTAINABLE POWER AND ENERGY CONFERENCE (ISPEC), 2022,
  • [44] Intelligent fault diagnosis of rolling bearing based on novel CNN model considering data imbalance
    Xing, Ziyang
    Zhao, Rongzhen
    Wu, Yaochun
    He, Tianjing
    APPLIED INTELLIGENCE, 2022, 52 (14) : 16281 - 16293
  • [45] A novel CNN-ViT-based deep learning model for early skin cancer diagnosis
    Pacal, Ishak
    Ozdemir, Burhanettin
    Zeynalov, Javanshir
    Gasimov, Huseyn
    Pacal, Nurettin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 104
  • [46] A Novel Channel Pruning Approach based on Local Attention and Global Ranking for CNN Model Compression
    Lu, Wei
    Jiang, Yang
    Jing, Peiguang
    Chu, Jinghui
    Fan, Fugui
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1433 - 1438
  • [47] A novel hybrid model for intrusion detection systems in SDNs based on CNN and a new regularization technique
    ElSayed, Mahmoud Said
    Le-Khac, Nhien-An
    Albahar, Marwan Ali
    Jurcut, Anca
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2021, 191
  • [48] A novel CNN-GRU-LSTM based deep learning model for accurate traffic prediction
    Vandana Singh
    Sudip Kumar Sahana
    Vandana Bhattacharjee
    Discover Computing, 28 (1)
  • [49] Intelligent fault diagnosis of rolling bearing based on novel CNN model considering data imbalance
    Ziyang Xing
    Rongzhen Zhao
    Yaochun Wu
    Tianjing He
    Applied Intelligence, 2022, 52 : 16281 - 16293
  • [50] A Novel Damage Identification Method for Steel Catenary Risers Based on a Novel CNN-GRU Model Optimized by PSO
    Liu, Zhongyan
    Mei, Jiangtao
    Wang, Deguo
    Guo, Yanbao
    Wu, Lei
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2023, 11 (01)