WaveNet: Wavelet Network With Knowledge Distillation for RGB-T Salient Object Detection

被引:61
|
作者
Zhou, Wujie [1 ]
Sun, Fan [1 ,2 ]
Jiang, Qiuping [3 ]
Cong, Runmin [4 ]
Hwang, Jenq-Neng [5 ]
机构
[1] Zhejiang Univ Sci & Technol, Sch Informat & Elect Engn, Hangzhou 310023, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 308232, Singapore
[3] Ningbo Univ, Sch Informat Sci & Engn, Ningbo 315211, Peoples R China
[4] Shandong Univ, Sch Control Sci & Engn, Jinan, Peoples R China
[5] Univ Washington, Dept Elect Engn, Seattle, WA 98105 USA
基金
中国国家自然科学基金;
关键词
Transformers; Feature extraction; Discrete wavelet transforms; Training; Knowledge engineering; Cross layer design; Convolutional neural networks; Wavelet; knowledge distillation; discrete wavelet transform; progressively stretched sine-cosine module; edge-aware module; FUSION; IMAGE;
D O I
10.1109/TIP.2023.3275538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, various neural network architectures for computer vision have been devised, such as the visual transformer and multilayer perceptron (MLP). A transformer based on an attention mechanism can outperform a traditional convolutional neural network. Compared with the convolutional neural network and transformer, the MLP introduces less inductive bias and achieves stronger generalization. In addition, a transformer shows an exponential increase in the inference, training, and debugging times. Considering a wave function representation, we propose the WaveNet architecture that adopts a novel vision task-oriented wavelet-based MLP for feature extraction to perform salient object detection in RGB (red-green-blue)-thermal infrared images. In addition, we apply knowledge distillation to a transformer as an advanced teacher network to acquire rich semantic and geometric information and guide WaveNet learning with this information. Following the shortestpath concept, we adopt the Kullback-Leibler distance as a regularization term for the RGB features to be as similar to the thermal infrared features as possible. The discrete wavelet transform allows for the examination of frequency-domain features in a local time domain and time-domain features in a local frequency domain. We apply this representation ability to perform cross-modality feature fusion. Specifically, we introduce a progressively cascaded sine-cosine module for cross-layer feature fusion and use low-level features to obtain clear boundaries of salient objects through the MLP. Results from extensive experiments indicate that the proposed WaveNet achieves impressive performance on benchmark RGB-thermal infrared datasets. The results and code are publicly available at https://github.com/nowander/WaveNet.
引用
收藏
页码:3027 / 3039
页数:13
相关论文
共 50 条
  • [41] Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Wan, Bin
    Zhou, Xiaofei
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    ENTROPY, 2024, 26 (02)
  • [42] CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection
    Chen, Gang
    Shao, Feng
    Chai, Xiongli
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6308 - 6323
  • [43] EAF-Net: an enhancement and aggregation-feedback network for RGB-T salient object detection
    He, Haiyang
    Wang, Jing
    Li, Xiaolin
    Hong, Minglin
    Huang, Shiguo
    Zhou, Tao
    MACHINE VISION AND APPLICATIONS, 2022, 33 (04)
  • [44] RGB-T salient object detection via excavating and enhancing CNN features
    Hongbo Bi
    Jiayuan Zhang
    Ranwan Wu
    Yuyu Tong
    Xiaowei Fu
    Keyong Shao
    Applied Intelligence, 2023, 53 : 25543 - 25561
  • [45] Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Zhou, Xiaofei
    Wan, Bin
    Wang, Shuai
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (02) : 4741 - 4755
  • [46] Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection
    Tang, Hao
    Li, Zechao
    Zhang, Dong
    He, Shengfeng
    Tang, Jinhui
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1958 - 1974
  • [47] Masked Visual Pre-training for RGB-D and RGB-T Salient Object Detection
    Qi, Yanyu
    Guo, Ruohao
    Li, Zhenbo
    Niu, Dantong
    Qu, Liao
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 49 - 66
  • [48] Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
    Luo, Jincheng
    Li, Yongjun
    Li, Bo
    Zhang, Xinru
    Li, Chaoyue
    Chenjin, Zhimin
    He, Jingyi
    Liang, Yifei
    NEUROCOMPUTING, 2024, 600
  • [49] CAE-Net: Cross-Modal Attention Enhancement Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Wan, Bin
    Zhou, Xiaofei
    Sun, Yaoqi
    Hu, Ji
    Zhang, Jiyong
    Yan, Chenggang
    ELECTRONICS, 2023, 12 (04)
  • [50] Cross-Collaboration Weighted Fusion Network for RGB-T Salient Detection
    Wang, Yumei
    Dongye, Changlei
    Zhao, Wenxiu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14865 : 301 - 312