MAAS: Multi-modal Assignation for Active Speaker Detection

被引:17
|
作者
Leon Alcazar, Juan [1 ]
Heilbron, Fabian Caba [2 ]
Thabet, Ali K. [1 ]
Ghanem, Bernard [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Thuwal, Saudi Arabia
[2] Adobe Res, San Jose, CA USA
关键词
DIARIZATION;
D O I
10.1109/ICCV48922.2021.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Active speaker detection requires a mindful integration of multi-modal cues. Current methods focus on modeling and fusing short-term audiovisual features for individual speakers, often at frame level. We present a novel approach to active speaker detection that directly addresses the multi-modal nature of the problem and provides a straightforward strategy, where independent visual features (speakers) in the scene are assigned to a previously detected speech event. Our experiments show that a small graph data structure built from local information can approximate an instantaneous audio-visual assignment problem. Moreover, the temporal extension of this initial graph achieves a new state-of-the-art performance on the AVA-ActiveSpeaker dataset with a mAP of 88.8%.
引用
收藏
页码:265 / 274
页数:10
相关论文
共 50 条
  • [41] Deep Multi-modal Object Detection for Autonomous Driving
    Ennajar, Amal
    Khouja, Nadia
    Boutteau, Remi
    Tlili, Fethi
    [J]. 2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 7 - 11
  • [42] XBully: Cyberbullying Detection within a Multi-Modal Context
    Cheng, Lu
    Li, Jundong
    Silva, Yasin N.
    Hall, Deborah L.
    Liu, Huan
    [J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 339 - 347
  • [43] Automatic Group Cohesiveness Detection With Multi-modal Features
    Zhu, Bin
    Guo, Xin
    Barner, Kenneth E.
    Boncelet, Charles
    [J]. ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 577 - 581
  • [44] Multi-modal data novelty detection with adversarial autoencoders
    Chen, Zeqiu
    Zhao, Kaiyi
    Sun, Ruizhi
    [J]. APPLIED SOFT COMPUTING, 2024, 165
  • [45] ConvNet frameworks for multi-modal fake news detection
    Chahat Raj
    Priyanka Meel
    [J]. Applied Intelligence, 2021, 51 : 8132 - 8148
  • [46] A multi-modal approach for activity classification and fall detection
    Carlos Castillo, Jose
    Carneiro, Davide
    Serrano-Cuerda, Juan
    Novais, Paulo
    Fernandez-Caballero, Antonio
    Neves, Jose
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2014, 45 (04) : 810 - 824
  • [47] Detection Methods for Multi-Modal Inertial Gas Sensors
    Najar, Fehmi
    Ghommem, Mehdi
    Kocer, Samed
    Elhady, Alaa
    Abdel-Rahman, Eihab M.
    [J]. SENSORS, 2022, 22 (24)
  • [48] Multi-Modal Sarcasm Detection with Sentiment Word Embedding
    Fu, Hao
    Liu, Hao
    Wang, Hongling
    Xu, Linyan
    Lin, Jiali
    Jiang, Dazhi
    [J]. ELECTRONICS, 2024, 13 (05)
  • [49] WAVELET-BASED MULTI-MODAL FIRE DETECTION
    Verstockt, Steven
    Kypraios, Ioannis
    De Potter, Pieterjan
    Poppe, Chris
    Van de Walle, Rik
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 903 - 907
  • [50] MMFusion: A Generalized Multi-Modal Fusion Detection Framework
    Cui, Leichao
    Li, Xiuxian
    Meng, Min
    Mo, Xiaoyu
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, ICDL, 2023, : 415 - 422