MULTILEVEL SPEECH INTELLIGIBILITY FOR ROBUST SPEAKER RECOGNITION

被引:0
|
作者
Nemala, Sridhar Krishna [1 ]
Elhilali, Mounya [1 ]
机构
[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Ctr Speech & Language Proc, Baltimore, MD 21218 USA
关键词
Speech intelligibility; Voice-activity detection; Speaker recognition; Noise robustness;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In the real world, natural conversational speech is an amalgam of speech segments, silences and environmental/background and channel effects. Labeling the different regions of an acoustic signal according to their information levels would greatly benefit all automatic speech processing tasks. In the current work, we propose a novel segmentation approach based on a perception-based measure of speech intelligibility. Unlike segmentation approaches based on various forms of voice-activity detection (VAD), the proposed parsing approach exploits higher-level perceptual information about signal intelligibility levels. This labeling information is integrated into a novel multilevel framework for automatic speaker recognition task. The system processes the input acoustic signal along independent streams reflecting various levels of intelligibility and then fusing the decision scores from the multiple steams according to their intelligibility contribution. Our results show that the proposed system achieves significant improvements over standard baseline and VAD-based approaches, and attains a performance similar to the one obtained with oracle speech segmentation information.
引用
收藏
页码:4393 / 4396
页数:4
相关论文
共 50 条
  • [1] Speaker and Noise Factorization for Robust Speech Recognition
    Wang, Yongqiang
    Gales, Mark J. F.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (07): : 2149 - 2158
  • [2] Audio-Visual Multilevel Fusion for Speech and Speaker Recognition
    Chetty, Girija
    Wagner, Michael
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 379 - 382
  • [3] Robust Digital Speech Watermarking For Online Speaker Recognition
    Nematollahi, Mohammad Ali
    Gamboa-Rosales, Hamurabi
    Akhaee, Mohammad Ali
    Al-Haddad, S. A. R.
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [4] Noise robust estimate of speech dynamics for speaker recognition
    Openshaw, JP
    Mason, JS
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 925 - 928
  • [5] Channel Robust MFCCs for Continuous Speech Speaker Recognition
    Chougule, Sharada Vikram
    Chavan, Mahesh S.
    [J]. ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 557 - 568
  • [6] Robust speech recognition with speaker localization by a microphone array
    Yamada, T
    Nakamura, S
    Shikano, K
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1317 - 1320
  • [7] Efficient Speaker and Noise Normalization for Robust Speech Recognition
    Joshi, Vikas
    Bilgi, Raghavendra
    Umesh, S.
    Benitez, C.
    Garcia, L.
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2612 - 2615
  • [8] An Integrated Approach to Robust Speaker Identification and Speech Recognition
    Kwan, C.
    Yin, J.
    Ayhan, B.
    Chu, S.
    Liu, X.
    Puckett, K.
    Zhao, Y.
    Ho, K. C.
    Kruger, M.
    Sityar, I.
    [J]. 2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1635 - +
  • [9] Speaker recognition based on multilevel speech signal analysis on Polish corpus
    Szymon Drgas
    Adam Dabrowski
    [J]. Multimedia Tools and Applications, 2015, 74 : 4195 - 4211
  • [10] Speaker recognition based on multilevel speech signal analysis on Polish corpus
    Drgas, Szymon
    Dabrowski, Adam
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (12) : 4195 - 4211