Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI

被引:1
|
作者
Sakata, Koki [1 ]
Sakashita, Shota [1 ]
Matsuo, Kazuya [1 ]
Kurogi, Shuichi [1 ]
机构
[1] Kyushu Inst Technol, Kitakyushu, Fukuoka 8048550, Japan
关键词
Probabilistic prediction; Speaker detection; Generalized Gibbs-distribution-based extended Bayesian inference;
D O I
10.1007/978-3-319-46681-1_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a method of speaker detection using probabilistic prediction for avoiding the tuning of thresholds to detect a speaker in an audio stream. We introduce g-GEBI (generalized GEBI) as a generalization of BI (Bayesian Inference) and GEBI (Gibbsdistribution- based Extended BI) to execute iterative detection of a speaker in audio stream uttered by more than one speaker. Then, we show a method of probabilistic prediction in multiclass classification to classify the results of speaker detection. By means of numerical experiments using recorded real speech data, we examine the properties and the effectiveness of the present method. Especially, we show that g-GEBI and g-BI (generalized BI) are more effective than the conventional BI and GEBI in incremental speaker detection task.
引用
收藏
页码:302 / 311
页数:10
相关论文
共 50 条
  • [31] Deepfake Audio Detection via MFCC Features Using Machine Learning
    Hamza, Ameer
    Javed, Abdul Rehman
    Iqbal, Farkhund
    Kryvinska, Natalia
    Almadhor, Ahmad S. S.
    Jalil, Zunera
    Borghol, Rouba
    IEEE ACCESS, 2022, 10 : 134018 - 134028
  • [32] Deepfake Audio Detection via MFCC Features Using Machine Learning
    Hamza, Ameer
    Javed, Abdul Rehman Rehman
    Iqbal, Farkhund
    Kryvinska, Natalia
    Almadhor, Ahmad S.
    Jalil, Zunera
    Borghol, Rouba
    IEEE Access, 2022, 10 : 134018 - 134028
  • [33] Shorter latency of real-time epileptic seizure detection via probabilistic prediction
    Xu, Yankun
    Yang, Jie
    Ming, Wenjie
    Wang, Shuang
    Sawan, Mohamad
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 236
  • [34] Audio coding for representation in MIDI via pitch detection using harmonic dictionaries
    Sieger, NJ
    Tewfik, AH
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 1998, 20 (1-2): : 45 - 59
  • [35] Audio coding for representation in MIDI via pitch detection using harmonic dictionaries
    J VLSI Signal Process Syst Signal Image Video Technol, 1-2 (45-59):
  • [36] Audio Coding for Representation in MIDI via Pitch Detection Using Harmonic Dictionaries
    Nicholas J. Sieger
    Ahmed H. Tewfik
    Journal of VLSI signal processing systems for signal, image and video technology, 1998, 20 : 45 - 59
  • [37] Fault Detection using Probabilistic Prediction and Data Fusion on a Bulk Good System
    Arevalo, Fernando
    Mohammed, Tariq
    Schwung, Andreas
    2017 52ND INTERNATIONAL UNIVERSITIES POWER ENGINEERING CONFERENCE (UPEC), 2017,
  • [38] Audio-Visual Speech Synchronization Detection Using a Bimodal Linear Prediction Model
    Kumar, Kshitiz
    Navratil, Jiri
    Marcheret, Etienne
    Libal, Vit
    Ramaswamy, Ganesh
    Potamianos, Gerasimos
    2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 670 - +
  • [39] Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
    Fan, Zhiyun
    Liang, Zhenlin
    Dong, Linhao
    Liu, Yi
    Zhou, Shiyu
    Cai, Meng
    Zhang, Jun
    Ma, Zejun
    Xu, Bo
    INTERSPEECH 2022, 2022, : 3749 - 3753
  • [40] Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection
    Li, Yanxiong
    Zhang, Xue
    Jin, Hai
    Li, Xianku
    Wang, Qin
    He, Qianhua
    Huang, Qian
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (01) : 897 - 916