Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI

被引:1
|
作者
Sakata, Koki [1 ]
Sakashita, Shota [1 ]
Matsuo, Kazuya [1 ]
Kurogi, Shuichi [1 ]
机构
[1] Kyushu Inst Technol, Kitakyushu, Fukuoka 8048550, Japan
关键词
Probabilistic prediction; Speaker detection; Generalized Gibbs-distribution-based extended Bayesian inference;
D O I
10.1007/978-3-319-46681-1_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a method of speaker detection using probabilistic prediction for avoiding the tuning of thresholds to detect a speaker in an audio stream. We introduce g-GEBI (generalized GEBI) as a generalization of BI (Bayesian Inference) and GEBI (Gibbsdistribution- based Extended BI) to execute iterative detection of a speaker in audio stream uttered by more than one speaker. Then, we show a method of probabilistic prediction in multiclass classification to classify the results of speaker detection. By means of numerical experiments using recorded real speech data, we examine the properties and the effectiveness of the present method. Especially, we show that g-GEBI and g-BI (generalized BI) are more effective than the conventional BI and GEBI in incremental speaker detection task.
引用
收藏
页码:302 / 311
页数:10
相关论文
共 50 条
  • [21] Robust Audio-Visual Speech Synchrony Detection by Generalized Bimodal Linear Prediction
    Kumar, Kshitiz
    Navratil, Jiri
    Marcheret, Etienne
    Libal, Vit
    Potamianos, Gerasimos
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2219 - +
  • [22] USING ENHANCED F0-TRAJECTORIES FOR MULTIPLE SPEAKER DETECTION IN AUDIO MONITORING SCENARIOS
    Cornaggia-Urrigshardt, Alessia
    Kurth, Frank
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1093 - 1097
  • [23] Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection
    Li, Kai
    Li, Sheng
    Lu, Xugang
    Akagi, Masato
    Liu, Meng
    Zhang, Lin
    Zeng, Chang
    Wang, Longbiao
    Dang, Jianwu
    Unoki, Masashi
    INTERSPEECH 2022, 2022, : 664 - 668
  • [24] Video based online behavior detection using probabilistic multi stream fusion
    Arsic, D
    Wallhoff, F
    Schuller, B
    Rigoll, G
    2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 2041 - 2044
  • [25] Video based online behavior detection using probabilistic multi stream fusion
    Arsic, D
    Wallhoff, F
    Schuller, B
    Rigoll, G
    2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 1355 - 1358
  • [26] Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least Squares
    Chen, Chen
    Han, Jiqing
    Pan, Yilin
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1537 - 1541
  • [27] WLAN Channel Status Duration Prediction for Audio and Video Services Using Probabilistic Neural Networks
    Hou, Yafei
    Denno, Satoshi
    IEEE Access, 2024, 12 : 28201 - 28211
  • [28] WLAN Channel Status Duration Prediction for Audio and Video Services Using Probabilistic Neural Networks
    Hou, Yafei
    Denno, Satoshi
    IEEE ACCESS, 2024, 12 : 28201 - 28211
  • [29] Audio-visual speaker identification via adaptive fusion using reliability estimates of both modalities
    Fox, NA
    O'Mullane, BA
    Reilly, RB
    AUDIO AND VIDEO BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2005, 3546 : 787 - 796
  • [30] Optimized technique for speaker changes detection in multispeaker audio recording using pyknogram and efficient distance metric
    Kaur, Sukhvinder
    Prabha, Chander
    Singh, Ravinder Pal
    Gupta, Deepali
    Juneja, Sapna
    Gupta, Punit
    Nauman, Ali
    PLOS ONE, 2024, 19 (11):