ISTR: Mask-Embedding-Based Instance Segmentation Transformer

被引:0
|
作者
Hu, Jie [1 ]
Lu, Yao [1 ]
Zhang, Shengchuan [1 ]
Cao, Liujuan [1 ]
机构
[1] Xiamen Univ, Key Lab Multimedia Trusted Percept & Efficient Com, Minist Educ China, Xiamen 361005, Peoples R China
关键词
Transformers; Instance segmentation; Pipelines; Decoding; Principal component analysis; Discrete cosine transforms; Tuners; vision transformer; mask encoding and decoding; mutual information maximization;
D O I
10.1109/TIP.2024.3385980
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based instance-level recognition has attracted increasing research attention recently due to the superior performance. However, although attempts have been made to encode masks as embeddings into Transformer-based frameworks, how to combine mask embeddings and spatial information for a transformer-based approach is still not fully explored. In this paper, we revisit the design of mask-embedding-based pipelines and propose an Instance Segmentation TRansformer (ISTR) with Mask Meta-Embeddings (MME), leveraging the strengths of transformer models in encoding embedding information and incorporating spatial information from mask embeddings. ISTR incorporates a recurrent refining head that consists of a Dynamic Box Predictor (DBP), a Mask Information Generator (MIG), and a Mask Meta-Decoder (MMD). To improve the quality of mask embeddings, MME interprets the mask encoding-decoding processes as a mutual information maximization problem, which unifies the objective functions of different decoding schemes such as Principal Component Analysis (PCA) and Discrete Cosine Transform (DCT) with a meta-formulation. Under the meta-formulation, a learnable Spatial Mask Tuner (SMT) is further proposed, which fuses the spatial and embedding information produced from MIG and can significantly boost the segmentation performance. The resulting varieties, i.e., ISTR-PCA, ISTR-DCT, and ISTR-SMT, demonstrate the effectiveness and efficiency of incorporating mask embeddings with the query-based instance segmentation pipelines. On the COCO dataset, ISTR surpasses all predominant mask-embedding-based models by a large margin, and achieves competitive performance compared to concurrent state-of-the-art models. On the Cityscapes dataset, ISTR also outperforms several strong baselines. Our code has been made available at: https://github.com/hujiecpp/ISTR.
引用
收藏
页码:2895 / 2907
页数:13
相关论文
共 50 条
  • [1] Embedding-based Instance Segmentation in Microscopy
    Lalit, Manan
    Tomancak, Pavel
    Jug, Florian
    [J]. MEDICAL IMAGING WITH DEEP LEARNING, VOL 143, 2021, 143 : 399 - 415
  • [2] Mask3D: Mask Transformer for 3D Semantic Instance Segmentation
    Schult, Jonas
    Engelmann, Francis
    Hermans, Alexander
    Litany, Or
    Tang, Siyu
    Leibe, Bastian
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 8216 - 8223
  • [3] Mask-Attention-Free Transformer for 3D Instance Segmentation
    Lai, Xin
    Yuan, Yuhui
    Chu, Ruihang
    Chen, Yukang
    Hu, Han
    Jia, Jiaya
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3670 - 3680
  • [4] Multi-Swin Mask Transformer for Instance Segmentation of Agricultural Field Extraction
    Zhong, Bo
    Wei, Tengfei
    Luo, Xiaobo
    Du, Bailin
    Hu, Longfei
    Ao, Kai
    Yang, Aixia
    Wu, Junjun
    [J]. REMOTE SENSING, 2023, 15 (03)
  • [5] Spatial-channel transformer network based on mask-RCNN for efficient mushroom instance segmentation
    Wang, Jiaoling
    Song, Weidong
    Zheng, Wengang
    Feng, Qingchun
    Wang, Mingfei
    Zhao, Chunjiang
    [J]. INTERNATIONAL JOURNAL OF AGRICULTURAL AND BIOLOGICAL ENGINEERING, 2024, 17 (04) : 227 - 235
  • [6] EmbedMask: Embedding Coupling for Instance Segmentation
    Ying, Hui
    Huang, Zhaojin
    Liu, Shu
    Shao, Tianjia
    Zhou, Kun
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1266 - 1273
  • [7] Mask encoding: A general instance mask representation for object segmentation
    Zhang, Rufeng
    Kong, Tao
    Wang, Xinlong
    You, Mingyu
    [J]. PATTERN RECOGNITION, 2022, 124
  • [8] Mask encoding: A general instance mask representation for object segmentation
    Zhang, Rufeng
    Kong, Tao
    Wang, Xinlong
    You, Mingyu
    [J]. Pattern Recognition, 2022, 124
  • [9] An Instance Segmentation Algorithm Based on Improved Mask R-CNN
    Yang, Qijuan
    Dong, Enzeng
    Zhu, Lin
    [J]. 2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 4804 - 4809
  • [10] Instance Segmentation of Concrete Defects Based on Improved Mask-RCNN
    Huang, Caiping
    Xie, Xin
    Zhou, Yongkang
    Li, Guilong
    [J]. Bridge Construction, 2023, 53 (06) : 63 - 70