An Image Patch is a Wave: Phase-Aware Vision MLP

被引:67
|
作者
Tang, Yehui [1 ,2 ]
Han, Kai [2 ]
Guo, Jianyuan [2 ,3 ]
Xu, Chang [3 ]
Li, Yanxi [2 ,3 ]
Xu, Chao [1 ]
Wang, Yunhe [2 ]
机构
[1] Peking Univ, Sch Artificial Intelligence, Beijing, Peoples R China
[2] Huawei Noahs Ark Lab, Hong Kong, Peoples R China
[3] Univ Sydney, Sch Comp Sci, Sydney, NSW, Australia
基金
中国国家自然科学基金; 澳大利亚研究理事会;
关键词
D O I
10.1109/CVPR52688.2022.01066
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of computer vision, recent works show that a pure MLP architecture mainly stacked by fully-connected layers can achieve competing performance with CNN and transformer. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive experiments demonstrate that the proposed Wave-MLP is superior to the state-of-the-art MLP architectures on various vision tasks such as image classification, object detection and semantic segmentation. The source code is available at https://github.com/huawei-noah/CV-Backbones/tree/master/wavem1p_pytorch and https://gitee.com/mindspore/models/tree/master/research/cv/wave_m1p.
引用
收藏
页码:10925 / 10934
页数:10
相关论文
共 50 条
  • [21] Phase-Aware Speech Enhancement Based on Deep Neural Networks
    Zheng, Naijun
    Zhang, Xiao-Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (01) : 63 - 76
  • [22] POSTER: Phase-aware System-Side Sampling for HPC
    Scheipl, Julian
    Raoofy, Amir
    Ott, Michael
    Weidendorfer, Josef
    PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2023, CF 2023, 2023, : 220 - 221
  • [23] PhAT-QTL: A Phase-Aware Test for QTL Detection
    Subramaniam, Meena
    Zaitlen, Noah
    Ye, Jimmie
    BIOINFORMATICS RESEARCH AND APPLICATIONS (ISBRA 2017), 2017, 10330 : 150 - 161
  • [24] Phase-aware subspace decomposition for single channel speech separation
    Wiem, Belhedi
    Mohamed Anouar, Ben Messaoud
    Aicha, Bouzid
    IET SIGNAL PROCESSING, 2020, 14 (04) : 214 - 222
  • [25] Initial Study of a Phase-Aware Scheduling for Hardware Transactional Memory
    Tajimi, Tomoki
    Hirota, Anju
    Shioya, Ryota
    Goshima, Masahiro
    Tsumura, Tomoaki
    2017 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2017,
  • [26] RaMLP: Vision MLP via Region-aware Mixing
    Lai, Shenqi
    Du, Xi
    Guo, Jia
    Zhang, Kaipeng
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 999 - 1007
  • [27] PACDNN: A phase-aware composite deep neural network for speech enhancement
    Hasannezhad, Mojtaba
    Yu, Hongjiang
    Zhu, Wei-Ping
    Champagne, Benoit
    SPEECH COMMUNICATION, 2022, 136 : 1 - 13
  • [28] Designing Distributed Applications Using a Phase-Aware, Reversible System
    Paul, Ruma R.
    Melchior, Jeremie
    Van Roy, Peter
    Vlassov, Vladimir
    2017 IEEE 1ST INTERNATIONAL CONFERENCE ON EDGE COMPUTING (IEEE EDGE), 2017, : 55 - 64
  • [29] Vector-quantized Variational Autoencoder for Phase-aware Speech Enhancement
    Tuan Vu Ho
    Quoc Huy Nguyen
    Akagi, Masato
    Unoki, Masashi
    INTERSPEECH 2022, 2022, : 176 - 180
  • [30] Phase-Aware Multitone Digital Signal Based Test for RF Receivers
    Zeidan, Mohamad A.
    Banerjee, Gaurab
    Gharpurey, Ranjit
    Abraham, Jacob A.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2012, 59 (09) : 2097 - 2110