Speech and crosstalk detection in multichannel audio

被引:67
|
作者
Wrigley, SN [1 ]
Brown, GJ
Wan, V
Renals, S
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9LW, Midlothian, Scotland
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 01期
关键词
crosstalk; cochannel interference; meetings; feature extraction; hidden Markov models (HMM); speech recognition;
D O I
10.1109/TSA.2004.838531
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation.
引用
收藏
页码:84 / 91
页数:8
相关论文
共 50 条
  • [41] Virtual Microphones for Multichannel Audio Resynthesis
    Athanasios Mouchtaris
    Shrikanth S. Narayanan
    Chris Kyriakakis
    EURASIP Journal on Advances in Signal Processing, 2003
  • [42] Virtual microphones for multichannel audio applications
    Kyriakakis, C
    Mouchtaris, A
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 11 - 14
  • [43] Virtual Microphones for Multichannel Audio Resynthesis
    Mouchtaris, A. (mouchtar@sipi.usc.edu), 1600, Hindawi Publishing Corporation (2003):
  • [44] QRMA: quantum representation of multichannel audio
    Engin Şahin
    İhsan Yilmaz
    Quantum Information Processing, 2019, 18
  • [45] A symposium on multichannel audio for radio broadcasters
    不详
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2004, 52 (10): : 1066 - +
  • [46] EVALUATING THE ROBUSTNESS OF PRIVACY-SENSITIVE AUDIO FEATURES FOR SPEECH DETECTION IN PERSONAL AUDIO LOG SCENARIOS
    Parthasarathi, Sree Hari Krishnan
    Magimai-Doss, Mathew
    Bourlard, Herve
    Gatica-Perez, Daniel
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4474 - 4477
  • [47] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [48] Automatic hate speech detection in audio using machine learning algorithms
    Imbwaga J.L.
    Chittaragi N.B.
    Koolagudi S.G.
    International Journal of Speech Technology, 2024, 27 (02) : 447 - 469
  • [49] Extraction of audio features specific to speech production for multimodal speaker detection
    Besson, Patricia
    Popovici, Vlad
    Vesin, Jean-Marc
    Thiran, Jean-Philippe
    Kunt, Murat
    IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (01) : 63 - 73
  • [50] FASTAUDIO: A LEARNABLE AUDIO FRONT-END FOR SPOOF SPEECH DETECTION
    Fu, Quchen
    Teng, Zhongwei
    White, Jules
    Powell, Maria E.
    Schmidt, Douglas C.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3693 - 3697