Speech and crosstalk detection in multichannel audio

被引:67
|
作者
Wrigley, SN [1 ]
Brown, GJ
Wan, V
Renals, S
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9LW, Midlothian, Scotland
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 01期
关键词
crosstalk; cochannel interference; meetings; feature extraction; hidden Markov models (HMM); speech recognition;
D O I
10.1109/TSA.2004.838531
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation.
引用
收藏
页码:84 / 91
页数:8
相关论文
共 50 条
  • [1] SPEECH DETECTION ON BROADCAST AUDIO
    Zubari, Unal
    Ozan, Ezgi Can
    Acar, Banu Oskay
    Ciloglu, Tolga
    Esen, Ersin
    Ates, Tugrul K.
    Onur, Duygu Oskay
    18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 85 - 89
  • [2] Robust multichannel gender classification from speech in movie audio
    Kumar, Naveen
    Nasir, Md
    Georgiou, Panayiotis
    Narayanan, Shrikanth S.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2233 - 2237
  • [3] Speech Enhancement by Multichannel Crosstalk Resistant ANC and Improved Spectrum Subtraction
    Qingning Zeng
    WaleedH Abdulla
    EURASIP Journal on Advances in Signal Processing, 2006
  • [4] Speech enhancement by multichannel crosstalk resistant ANC and improved spectrum subtraction
    Zeng, Qingning
    Abdulla, Andwaleed H.
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)
  • [5] Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs
    Belloch, Jose A.
    Gonzalez, Alberto
    Martinez-Zaldivar, F. J.
    Vidal, Antonio M.
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2013, 20 (02) : 169 - 182
  • [6] Trends in Adaptive MISO System Identification for Multichannel Audio Reproduction and Speech Communication
    Thuene, Philipp
    Enzner, Gerald
    2013 8TH INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS (ISPA), 2013, : 767 - 772
  • [7] NON-SPEECH AUDIO EVENT DETECTION
    Portelo, Jose
    Bugalho, Miguel
    Trancoso, Isabel
    Neto, Joao
    Abad, Alberto
    Serralheiro, Antonio
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1973 - 1976
  • [8] Efficient multichannel detection of impulsive audio events for wireless networks
    Sanchez-Hevia, Hector A.
    Gil-Pita, Roberto
    Rosa-Zurera, Manuel
    APPLIED ACOUSTICS, 2021, 179
  • [9] Synthetic Speech Detection through Audio Folding
    Salvi, Davide
    Bestagini, Paolo
    Tubaro, Stefano
    PROCEEDINGS OF THE 2ND ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISCRIMINATION, MAD 2023, 2023, : 3 - 9
  • [10] 5.1 Multichannel audio
    Hamasaki, Kimio
    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2001, 55 (12):