Speech and crosstalk detection in multichannel audio

被引:67
|
作者
Wrigley, SN [1 ]
Brown, GJ
Wan, V
Renals, S
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9LW, Midlothian, Scotland
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 01期
关键词
crosstalk; cochannel interference; meetings; feature extraction; hidden Markov models (HMM); speech recognition;
D O I
10.1109/TSA.2004.838531
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation.
引用
收藏
页码:84 / 91
页数:8
相关论文
共 50 条
  • [21] PANEL DISCUSSION ON MULTICHANNEL AUDIO
    WHITEHURST, SL
    KELLER, T
    HARRISON, D
    HEASLETT, A
    INAMOTO, Y
    OAKER, L
    REPP, M
    SMPTE JOURNAL, 1982, 91 (04): : 396 - 396
  • [22] A MULTICHANNEL AUDIO RECORDER/REPRODUCER
    COLLINS, DR
    JOURNAL OF THE SOCIETY OF MOTION PICTURE TELEVISION ENGINEERS, 1970, 79 (09): : 818 - &
  • [23] Multichannel audio systems and techniques
    不详
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2005, 53 (04): : 329 - 335
  • [24] Multichannel audio decorrelation for coding
    Torres-Guijarro, S
    Ander, J
    Alava, B
    Casajús-Quirós, FJ
    Ortiz-Berenguer, LI
    DAFX-03: 6TH INTERNATIONAL CONFERENCE ON DIGITAL AUDIO EFFECTS, PROCEEDINGS, 2003, : 57 - 60
  • [25] Monitoring of the Multichannel Audio Signal
    Kornatowski, Eugeniusz
    COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT II, 2010, 6422 : 298 - 306
  • [26] MULTICHANNEL AUDIO FOR TELEVISION BROADCASTING
    JOEL, I
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1984, 32 (12): : 1005 - 1005
  • [27] SIMULTANEOUS CHANNEL ESTIMATION AND JOINT TIME-FREQUENCY DOMAIN CROSSTALK CANCELLATION IN MULTICHANNEL PERSONAL AUDIO SYSTEMS
    Tataria, Harsh
    Teal, Paul D.
    Poletti, Mark
    Betlehem, Terence
    2014 IEEE WORKSHOP ON STATISTICAL SIGNAL PROCESSING (SSP), 2014, : 488 - 491
  • [28] Hate Speech Detection in Audio Using SHAP - An Explainable AI
    Imbwaga, Joan L.
    Chittaragi, Nagaratna B.
    Koolagudi, Shashidhar G.
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT II, 2024, 2091 : 289 - 304
  • [29] Speech Audio Splicing Detection and Localization Exploiting Reverberation Cues
    Capoferri, Davide
    Borrelli, Clara
    Bestagini, Paolo
    Antonacci, Fabio
    Sarti, Augusto
    Tubaro, Stefano
    2020 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2020,
  • [30] Speech Audio Deepfake Detection via Convolutional Neural Networks
    Valente, Lucas P.
    de Souza, Marcelo M. S.
    da Rocha, Alan M.
    IEEE CONFERENCE ON EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS 2024, IEEE EAIS 2024, 2024, : 382 - 387