Speaker Separation Using Visual Speech Features and Single-channel Audio

被引:0
|
作者
Khan, Faheem [1 ]
Milner, Ben [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
关键词
Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.
引用
收藏
页码:3263 / 3267
页数:5
相关论文
共 50 条
  • [1] Using audio and visual information for single channel speaker separation
    Khan, Faheem
    Milner, Ben
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1517 - 1521
  • [2] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
    Mowlaee, P.
    Saeidi, R.
    Tan, Z. -H.
    Christensen, M. G.
    Franti, P.
    Jensen, S. H.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
  • [3] A Joint Approach for Single-Channel Speaker Identification and Speech Separation
    Mowlaee, Pejman
    Saeidi, Rahim
    Christensen, Mads Grsboll
    Tan, Zheng-Hua
    Kinnunen, Tomi
    Franti, Pasi
    Jensen, Soren Holdt
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (09): : 2586 - 2601
  • [4] Speaker Verification-Based Evaluation of Single-Channel Speech Separation
    Maciejewski, Matthew
    Watanabe, Shinji
    Khudanpur, Sanjeev
    [J]. INTERSPEECH 2021, 2021, : 3520 - 3524
  • [5] Linear regression on sparse features for single-channel speech separation
    Schmidt, Mikkel N.
    Olsson, Rasmus K.
    [J]. 2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2007, : 149 - 152
  • [6] Candidate Speech Extraction from Multi-speaker Single-Channel Audio Interviews
    Pandharipande, Meghna
    Kopparapu, Sunil Kumar
    [J]. SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 210 - 221
  • [7] Using Visual Speech Information in Masking Methods for Audio Speaker Separation
    Khan, Faheem Ullah
    Milner, Ben P.
    Le Cornu, Thomas
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1742 - 1754
  • [8] A VQ-based Single-Channel Audio Separation for Music/Speech Mixtures
    Asgari, Meysam
    Fallah, Mahdi
    Mehrizi, Elahe Abouie
    Mostafavi, Ali
    [J]. UKSIM 2009: ELEVENTH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION, 2009, : 223 - +
  • [9] SINGLE-CHANNEL SPEECH EXTRACTION USING SPEAKER INVENTORY AND ATTENTION NETWORK
    Xiao, Xiong
    Chen, Zhuo
    Yoshioka, Takuya
    Erdogan, Hakan
    Liu, Changliang
    Dimitriadis, Dimitrios
    Droppo, Jasha
    Gong, Yifan
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 86 - 90
  • [10] Single-Channel Multi-Speaker Separation using Deep Clustering
    Isik, Yusuf
    Le Roux, Jonathan
    Chen, Zhuo
    Watanabe, Shinji
    Hershey, John R.
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 545 - 549