FUSION AND ORTHOGONAL PROJECTION FOR IMPROVED FACE-VOICE ASSOCIATION

被引:10
|
作者
Saeed, Muhammad Saad [1 ]
Khan, Muhammad Haris [2 ]
Nawaz, Shah [3 ,5 ]
Yousaf, Muhammad Haroon [1 ]
Del Bue, Alessio [3 ,4 ]
机构
[1] Univ Engn & Technol Taxila, Swarm Robot Lab SRL NCRA, Rawalpindi, India
[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[3] Ist Italiano Tecnol IIT, Pattern Anal & Comp Vis PAVIS, Genoa, Italy
[4] Ist Italiano Tecnol IIT, Visual Geometry & Modelling VGM, Genoa, Italy
[5] Deutsch Elektronen Synchrotron DESY, Hamburg, Germany
关键词
Multimodal; Face-voice association; Cross-modal verification and matching;
D O I
10.1109/ICASSP43922.2022.9747704
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We study the problem of learning association between face and voice. Prior works adopt pairwise or triplet loss formulations to learn an embedding space amenable for associated matching and verification tasks. Albeit showing some progress, such loss formulations are restrictive due to dependency on distance-dependent margin parameter, poor runtime training complexity, and reliance on carefully crafted negative mining procedures. In this work, we hypothesize that enriched feature representation coupled with an effective yet efficient supervision is necessary in realizing a discriminative joint embedding space for improved face-voice association. To this end, we propose a light-weight, plug-and-play mechanism that exploits the complementary cues in both modalities to form enriched fused embeddings and clusters them based on their identity labels via orthogonality constraints. We coin our proposed mechanism as fusion and orthogonal projection (FOP) and instantiate in a two-stream pipeline. The overall resulting framework is evaluated on a large-scale VoxCeleb dataset with a multitude of tasks, including cross-modal verification and matching. Our method performs favourably against the current state-of-the-art methods and our proposed supervision formulation is more effective and efficient than the ones employed by the contemporary methods.
引用
收藏
页码:7057 / 7061
页数:5
相关论文
共 50 条
  • [1] An Efficient Momentum Framework for Face-Voice Association Learning
    Qiu, Yuanyuan
    Yu, Zhenning
    Gao, Zhenguo
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 271 - 283
  • [2] FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection
    Carneiro, Hugo
    Weber, Cornelius
    Wermter, Stefan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT I, 2021, 12891 : 439 - 450
  • [3] Robust face-voice based speaker identity verification using multilevel fusion
    Chetty, Girija
    Wagner, Michael
    IMAGE AND VISION COMPUTING, 2008, 26 (09) : 1249 - 1260
  • [4] Investigating feature-level fusion for checking liveness in face-voice authentication
    Chetty, G
    Wagner, M
    ISSPA 2005: The 8th International Symposium on Signal Processing and its Applications, Vols 1 and 2, Proceedings, 2005, : 66 - 69
  • [5] The contribution of personality impression to face-voice integration
    Mitsufuji, Yuka
    Ogawa, Hirokazu
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2016, 51 : 169 - 169
  • [6] Face-voice authentication based on 3D face models
    Chetty, G
    Wagner, M
    COMPUTER VISION - ACCV 2006, PT I, 2006, 3851 : 559 - 568
  • [7] Learned face-voice pairings facilitate visual search
    Zweig, L. Jacob
    Suzuki, Satoru
    Grabowecky, Marcia
    PSYCHONOMIC BULLETIN & REVIEW, 2015, 22 (02) : 429 - 436
  • [8] A Low-Complexity Dynamic Face-Voice Feature Fusion Approach to Multimodal Person Recognition
    Shah, Dhaval
    Han, Kyu J.
    Nayaranan, Shrikanth S.
    2009 11TH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2009), 2009, : 24 - 31
  • [9] Speaking faces for face-voice speaker identity verification
    Chetty, Girija
    Wagner, Michael
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 513 - 516
  • [10] The Effect of Face-Voice Gender Consistency on Impression Evaluation
    Wen, Fangfang
    Gao, Jia
    Ke, Wenlin
    Zuo, Bin
    Dai, Yu
    Ju, Yiyan
    Long, Jiahui
    ARCHIVES OF SEXUAL BEHAVIOR, 2023, 52 (03) : 1123 - 1139