Automatic extraction of geometric lip features with application to multi-modal speaker identification

被引：6

作者：

Arsic, Ivana ^{[1
]}

Vilagut, Roger ^{[1
]}

Thiran, Jean-Philippe ^{[1
]}

机构：

[1] Ecole Polytech Fed Lausanne, Signal Proc Inst, CH-1015 Lausanne, Switzerland

来源：

2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS | 2006年

基金：

瑞士国家科学基金会;

关键词：

D O I：

10.1109/ICME.2006.262594

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we consider the problem of automatic extraction of the geometric lip features for the purposes of multi-modal speaker identification. The use of visual information from the mouth region can be of great importance for improving the speaker identification system performance in noisy conditions. We propose a novel method for automated lip features extraction that utilizes color space transformation and a fuzzy-based c-means clustering technique. Using the obtained visual cues closed-set audio-visual speaker identification experiments are performed on the CUAVE database, [1] showing promising results.

引用

页码：161 / +

页数：2

共 50 条

[1] LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION
Liu, Qinghua
Huang, Yating
Hao, Yunzhe
Xu, Jiaming
Xu, Bo
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 488 - 495
[2] MUSE: MULTI-MODAL TARGET SPEAKER EXTRACTION WITH VISUAL CUES
Pan, Zexu
Tao, Ruijie
Xu, Chenglin
Li, Haizhou
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6678 - 6682
[3] A syntactic approach to automatic lip feature extraction for speaker identification
Wark, T
Sridharan, S
[J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3693 - 3696
[4] The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMM's
Wark, T
Sridharan, S
Chandran, V
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 2389 - 2392
[5] Automatic Group Cohesiveness Detection With Multi-modal Features
Zhu, Bin
Guo, Xin
Barner, Kenneth E.
Boncelet, Charles
[J]. ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 577 - 581
[6] Lip features automatic extraction
Lievin, M
Luthon, F
[J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 3, 1998, : 168 - 172
[7] On-Line Multi-Modal Speaker Diarization
Noulas, Athanasios K.
Krose, Ben J. A.
[J]. ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, 2007, : 350 - 357
[8] Automatic Detection and Verification of Pipeline Construction Features with Multi-modal data
Vidal-Calleja, Teresa
Miro, Jaime Valls
Martin, Fernando
Lingnau, Daniel C.
Russell, David E.
[J]. 2014 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2014), 2014, : 3116 - 3122
[9] Speaker identification using speech and lip features
Ou, GB
Li, X
Yao, XC
Jia, HB
Murphey, YL
[J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2565 - 2570
[10] MSDWILD: MULTI-MODAL SPEAKER DIARIZATION DATASET IN THE WILD
Liu, Tao
Fang, Shuai
Xiang, Xu
Song, Hongbo
Lin, Shaoxiong
Sun, Jiaqi
Han, Tianyuan
Chen, Siyuan
Yao, Binwei
Liu, Sen
Wu, Yifei
Qian, Yanmin
Yu, Kai
[J]. INTERSPEECH 2022, 2022, : 1476 - 1480

← 1 2 3 4 5 →