Effective Local-Global Transformer for Natural Image Matting

被引:5
|
作者
Hu, Liangpeng [1 ]
Kong, Yating [1 ]
Li, Jide [1 ]
Li, Xiaoqiang [1 ]
机构
[1] Shanghai Univ, Comp Engn & Sci Dept, Shanghai 200444, Peoples R China
关键词
Natural image matting; transformer;
D O I
10.1109/TCSVT.2023.3234983
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Learning-based matting methods have been dominated by convolution neural networks for a long time. These methods mainly propagate the alpha matte according to the similarity between unknown and known regions. However, correlations between pixels in unknown and known regions are limited due to the insufficient receptive fields of common convolution neural networks, which leads to inaccurate estimation for pixels in unknown regions that are far away from known regions. In this paper, we propose an Effective Local-Global Transformer for natural image matting (ELGT-Matting), which can further expand receptive fields to establish a wide range of correlations between unknown and known regions. The kernel module is the effective local-global transformer block, and each block consists of two modules: 1) A Window-Level Global MSA (Multi-head Self-Attention) module, which learns global context features among windows. 2) A Local-Global Window MSA, which combines coarse global context features and corresponding fine local window features to help local window self-attention capture both local and context information. Experiments demonstrate that our ELGT-Matting performs outstandingly against other competitive approaches on Composition-1K, Distinctions-646, and real-world AIM-500 datasets. In particular, we achieve a new SOTA result on Composition-1K with MSE 0.00374.
引用
收藏
页码:3888 / 3898
页数:11
相关论文
共 50 条
  • [1] Transformer-based local-global guidance for image captioning
    Parvin, Hashem
    Naghsh-Nilchi, Ahmad Reza
    Mohammadi, Hossein Mahvash
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 223
  • [2] Fully Convolutional Transformer with Local-Global Attention
    Lee, Sihaeng
    Yi, Eojindl
    Lee, Janghyeon
    Yoo, Jinsu
    Lee, Honglak
    Kim, Seung Hwan
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 552 - 559
  • [3] Natural Image Matting with Attended Global Context
    Yi-Yi Zhang
    Li Niu
    Yasushi Makihara
    Jian-Fu Zhang
    Wei-Jie Zhao
    Yasushi Yagi
    Li-Qing Zhang
    [J]. Journal of Computer Science and Technology, 2023, 38 : 659 - 673
  • [4] Natural Image Matting with Attended Global Context
    Zhang, Yi-Yi
    Niu, Li
    Makihara, Yasushi
    Zhang, Jian-Fu
    Zhao, Wei-Jie
    Yagi, Yasushi
    Zhang, Li-Qing
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (03): : 659 - 673
  • [5] Hierarchical Local-Global Transformer for Temporal Sentence Grounding
    Fang, Xiang
    Liu, Daizong
    Zhou, Pan
    Xu, Zichuan
    Li, Ruixuan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3263 - 3277
  • [6] Hierarchical Local-Global Transformer for Temporal Sentence Grounding
    Fang, Xiang
    Liu, Daizong
    Zhou, Pan
    Xu, Zichuan
    Li, Ruixuan
    [J]. IEEE Transactions on Multimedia, 1600, (3263-3277):
  • [7] Local-Global Feature-Aware Transformer Based Residual Network for Hyperspectral Image Denoising
    Wang, Fengfeng
    Li, Jie
    Yuan, Qiangqiang
    Zhang, Liangpei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [8] DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
    Liang, Yuxuan
    Zhou, Pan
    Zimmermann, Roger
    Yan, Shuicheng
    [J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 577 - 595
  • [9] Local-Global Transformer Neural Network for temporal action segmentation
    Tian, Xiaoyan
    Jin, Ye
    Tang, Xianglong
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (02) : 615 - 626
  • [10] A Local-Global Interactive Vision Transformer for Aerial Scene Classification
    Peng, Ting
    Yi, Jingjun
    Fang, Yuan
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20