Speaker Separation Using Visual Speech Features and Single-channel Audio

被引：0

作者：

Khan, Faheem ^{[1
]}

Milner, Ben ^{[1
]}

机构：

[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.

引用

页码：3263 / 3267

页数：5

共 50 条

[1] Using audio and visual information for single channel speaker separation
Khan, Faheem
Milner, Ben
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1517 - 1521
[2] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
Mowlaee, P.
Saeidi, R.
Tan, Z. -H.
Christensen, M. G.
Franti, P.
Jensen, S. H.
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
[3] A Joint Approach for Single-Channel Speaker Identification and Speech Separation
Mowlaee, Pejman
Saeidi, Rahim
Christensen, Mads Grsboll
Tan, Zheng-Hua
Kinnunen, Tomi
Franti, Pasi
Jensen, Soren Holdt
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (09): : 2586 - 2601
[4] Speaker Verification-Based Evaluation of Single-Channel Speech Separation
Maciejewski, Matthew
Watanabe, Shinji
Khudanpur, Sanjeev
INTERSPEECH 2021, 2021, : 3520 - 3524
[5] Linear regression on sparse features for single-channel speech separation
Schmidt, Mikkel N.
Olsson, Rasmus K.
2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2007, : 149 - 152
[6] Candidate Speech Extraction from Multi-speaker Single-Channel Audio Interviews
Pandharipande, Meghna
Kopparapu, Sunil Kumar
SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 210 - 221
[7] Using Visual Speech Information in Masking Methods for Audio Speaker Separation
Khan, Faheem Ullah
Milner, Ben P.
Le Cornu, Thomas
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1742 - 1754
[8] A VQ-based Single-Channel Audio Separation for Music/Speech Mixtures
Asgari, Meysam
Fallah, Mahdi
Mehrizi, Elahe Abouie
Mostafavi, Ali
UKSIM 2009: ELEVENTH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION, 2009, : 223 - +
[9] SINGLE-CHANNEL SPEECH EXTRACTION USING SPEAKER INVENTORY AND ATTENTION NETWORK
Xiao, Xiong
Chen, Zhuo
Yoshioka, Takuya
Erdogan, Hakan
Liu, Changliang
Dimitriadis, Dimitrios
Droppo, Jasha
Gong, Yifan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 86 - 90
[10] Single-Channel Multi-Speaker Separation using Deep Clustering
Isik, Yusuf
Le Roux, Jonathan
Chen, Zhuo
Watanabe, Shinji
Hershey, John R.
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 545 - 549

← 1 2 3 4 5 →