Speaker Separation Using Visual Speech Features and Single-channel Audio

被引:0
|
作者
Khan, Faheem [1 ]
Milner, Ben [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
关键词
Speaker separation; Wiener filter; visual features; audio-visual correlation; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work proposes a method of single-channel speaker separation that uses visual speech information to extract a target speaker's speech from a mixture of speakers. The method requires a single audio input and visual features extracted from the mouth region of each speaker in the mixture. The visual information from speakers is used to create a visually-derived Wiener filter. The Wiener filter gains are then non-linearly adjusted by a perceptual gain transform to improve the quality and intelligibility of the target speech. Experimental results are presented that estimate the quality and intelligibility of the extracted target speaker and a comparison is made of different perceptual gain transforms. These show that significant gains are achieved by the application of the perceptual gain function.
引用
收藏
页码:3263 / 3267
页数:5
相关论文
共 50 条
  • [1] Using audio and visual information for single channel speaker separation
    Khan, Faheem
    Milner, Ben
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1517 - 1521
  • [2] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
    Mowlaee, P.
    Saeidi, R.
    Tan, Z. -H.
    Christensen, M. G.
    Franti, P.
    Jensen, S. H.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
  • [3] A Joint Approach for Single-Channel Speaker Identification and Speech Separation
    Mowlaee, Pejman
    Saeidi, Rahim
    Christensen, Mads Grsboll
    Tan, Zheng-Hua
    Kinnunen, Tomi
    Franti, Pasi
    Jensen, Soren Holdt
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (09): : 2586 - 2601
  • [4] Speaker Verification-Based Evaluation of Single-Channel Speech Separation
    Maciejewski, Matthew
    Watanabe, Shinji
    Khudanpur, Sanjeev
    INTERSPEECH 2021, 2021, : 3520 - 3524
  • [5] Linear regression on sparse features for single-channel speech separation
    Schmidt, Mikkel N.
    Olsson, Rasmus K.
    2007 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2007, : 149 - 152
  • [6] Candidate Speech Extraction from Multi-speaker Single-Channel Audio Interviews
    Pandharipande, Meghna
    Kopparapu, Sunil Kumar
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 210 - 221
  • [7] Using Visual Speech Information in Masking Methods for Audio Speaker Separation
    Khan, Faheem Ullah
    Milner, Ben P.
    Le Cornu, Thomas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1742 - 1754
  • [8] A VQ-based Single-Channel Audio Separation for Music/Speech Mixtures
    Asgari, Meysam
    Fallah, Mahdi
    Mehrizi, Elahe Abouie
    Mostafavi, Ali
    UKSIM 2009: ELEVENTH INTERNATIONAL CONFERENCE ON COMPUTER MODELLING AND SIMULATION, 2009, : 223 - +
  • [9] SINGLE-CHANNEL SPEECH EXTRACTION USING SPEAKER INVENTORY AND ATTENTION NETWORK
    Xiao, Xiong
    Chen, Zhuo
    Yoshioka, Takuya
    Erdogan, Hakan
    Liu, Changliang
    Dimitriadis, Dimitrios
    Droppo, Jasha
    Gong, Yifan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 86 - 90
  • [10] Single-Channel Multi-Speaker Separation using Deep Clustering
    Isik, Yusuf
    Le Roux, Jonathan
    Chen, Zhuo
    Watanabe, Shinji
    Hershey, John R.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 545 - 549