Automatic extraction of geometric lip features with application to multi-modal speaker identification

被引:6
|
作者
Arsic, Ivana [1 ]
Vilagut, Roger [1 ]
Thiran, Jean-Philippe [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Signal Proc Inst, CH-1015 Lausanne, Switzerland
基金
瑞士国家科学基金会;
关键词
D O I
10.1109/ICME.2006.262594
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we consider the problem of automatic extraction of the geometric lip features for the purposes of multi-modal speaker identification. The use of visual information from the mouth region can be of great importance for improving the speaker identification system performance in noisy conditions. We propose a novel method for automated lip features extraction that utilizes color space transformation and a fuzzy-based c-means clustering technique. Using the obtained visual cues closed-set audio-visual speaker identification experiments are performed on the CUAVE database, [1] showing promising results.
引用
收藏
页码:161 / +
页数:2
相关论文
共 50 条
  • [1] LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION
    Liu, Qinghua
    Huang, Yating
    Hao, Yunzhe
    Xu, Jiaming
    Xu, Bo
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 488 - 495
  • [2] MUSE: MULTI-MODAL TARGET SPEAKER EXTRACTION WITH VISUAL CUES
    Pan, Zexu
    Tao, Ruijie
    Xu, Chenglin
    Li, Haizhou
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6678 - 6682
  • [3] A syntactic approach to automatic lip feature extraction for speaker identification
    Wark, T
    Sridharan, S
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3693 - 3696
  • [4] The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMM's
    Wark, T
    Sridharan, S
    Chandran, V
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 2389 - 2392
  • [5] Automatic Group Cohesiveness Detection With Multi-modal Features
    Zhu, Bin
    Guo, Xin
    Barner, Kenneth E.
    Boncelet, Charles
    [J]. ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 577 - 581
  • [6] Lip features automatic extraction
    Lievin, M
    Luthon, F
    [J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 3, 1998, : 168 - 172
  • [7] On-Line Multi-Modal Speaker Diarization
    Noulas, Athanasios K.
    Krose, Ben J. A.
    [J]. ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, 2007, : 350 - 357
  • [8] Automatic Detection and Verification of Pipeline Construction Features with Multi-modal data
    Vidal-Calleja, Teresa
    Miro, Jaime Valls
    Martin, Fernando
    Lingnau, Daniel C.
    Russell, David E.
    [J]. 2014 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2014), 2014, : 3116 - 3122
  • [9] Speaker identification using speech and lip features
    Ou, GB
    Li, X
    Yao, XC
    Jia, HB
    Murphey, YL
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2565 - 2570
  • [10] MSDWILD: MULTI-MODAL SPEAKER DIARIZATION DATASET IN THE WILD
    Liu, Tao
    Fang, Shuai
    Xiang, Xu
    Song, Hongbo
    Lin, Shaoxiong
    Sun, Jiaqi
    Han, Tianyuan
    Chen, Siyuan
    Yao, Binwei
    Liu, Sen
    Wu, Yifei
    Qian, Yanmin
    Yu, Kai
    [J]. INTERSPEECH 2022, 2022, : 1476 - 1480