Weakly-Supervised Audio-Visual Segmentation

被引:0
|
作者
Mo, Shentong [1 ,2 ]
Raj, Bhiksha [1 ,2 ]
机构
[1] CMU, Pittsburgh, PA 15213 USA
[2] MBZUAI, Abu Dhabi, U Arab Emirates
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-visual segmentation is a challenging task that aims to predict pixel-level masks for sound sources in a video. Previous work applied a comprehensive manually designed architecture with countless pixel-wise accurate masks as supervision. However, these pixel-level masks are expensive and not available in all cases. In this work, we aim to simplify the supervision as the instance-level annotation, i.e., weakly-supervised audio-visual segmentation. We present a novel Weakly-Supervised Audio-Visual Segmentation framework, namely WS-AVS, that can learn multi-scale audio-visual alignment with multi-scale multiple-instance contrastive learning for audio-visual segmentation. Extensive experiments on AVS-Bench demonstrate the effectiveness of our WS-AVS in the weakly-supervised audio-visual segmentation of single-source and multi-source scenarios.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A Closer Look at Weakly-Supervised Audio-Visual Source Localization
    Mo, Shentong
    Morgado, Pedro
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
    Wu, Yu
    Yang, Yi
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1326 - 1335
  • [3] Boosting Positive Segments for Weakly-Supervised Audio-Visual Video Parsing
    Rachavarapu, Kranthi Kumar
    Rajagopalan, A. N.
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10158 - 10168
  • [4] Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
    Fan, Yingying
    Wu, Yu
    Du, Bo
    Lin, Yutian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
    School of Computer Science, Hubei Luojia Laboratory, Wuhan University, China
    [J]. Adv. neural inf. proces. syst., 1600,
  • [6] DHHN: Dual Hierarchical Hybrid Network for Weakly-Supervised Audio-Visual Video Parsing
    Jiang, Xun
    Xu, Xing
    Chen, Zhiguo
    Zhang, Jingran
    Song, Jingkuan
    Shen, Fumin
    Lu, Huimin
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [7] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
    Cheng, Haoyue
    Liu, Zhaoyang
    Zhou, Hang
    Qian, Chen
    Wu, Wayne
    Wang, Limin
    [J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 431 - 448
  • [8] Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing
    Mo, Shentong
    Tian, Yapeng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
    Lai, Yung-Hsuan
    Chen, Yen-Chun
    Wang, Yu-Chiang Frank
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Weakly Supervised Audio-Visual Violence Detection
    Wu, Peng
    Liu, Xiaotao
    Liu, Jing
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1674 - 1685