Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI

被引:1
|
作者
Sakata, Koki [1 ]
Sakashita, Shota [1 ]
Matsuo, Kazuya [1 ]
Kurogi, Shuichi [1 ]
机构
[1] Kyushu Inst Technol, Kitakyushu, Fukuoka 8048550, Japan
关键词
Probabilistic prediction; Speaker detection; Generalized Gibbs-distribution-based extended Bayesian inference;
D O I
10.1007/978-3-319-46681-1_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a method of speaker detection using probabilistic prediction for avoiding the tuning of thresholds to detect a speaker in an audio stream. We introduce g-GEBI (generalized GEBI) as a generalization of BI (Bayesian Inference) and GEBI (Gibbsdistribution- based Extended BI) to execute iterative detection of a speaker in audio stream uttered by more than one speaker. Then, we show a method of probabilistic prediction in multiclass classification to classify the results of speaker detection. By means of numerical experiments using recorded real speech data, we examine the properties and the effectiveness of the present method. Especially, we show that g-GEBI and g-BI (generalized BI) are more effective than the conventional BI and GEBI in incremental speaker detection task.
引用
收藏
页码:302 / 311
页数:10
相关论文
共 50 条
  • [1] Audio-visual speaker recognition using time-varying stream reliability prediction
    Chaudhari, UV
    Ramaswamy, GN
    Potamianos, G
    Neti, C
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 712 - 715
  • [2] AUDIO INPUTS FOR ACTIVE SPEAKER DETECTION AND LOCALIZATION VIA MICROPHONE ARRAY
    Berghi, Davide
    Jackson, Philip J. B.
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [3] Speaker detection using multi-speaker audio files for both enrollment and test
    Bonastre, JF
    Meignier, S
    Merlin, T
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
  • [4] Unsupervised speaker change detection using Probabilistic pattern matching
    Malegaonkar, A.
    Ariyaeeinia, A.
    Sivakumaran, P.
    Fortuna, J.
    IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (08) : 509 - 512
  • [5] Detecting paralinguistic events in audio stream using context in features and probabilistic decisions
    Gupta, Rahul
    Audhkhasi, Kartik
    Lee, Sungbok
    Narayanan, Shrikanth
    COMPUTER SPEECH AND LANGUAGE, 2016, 36 : 72 - 92
  • [6] Generalized Fake Audio Detection via Deep Stable Learning
    Wang, Zhiyong
    Fu, Ruibo
    Wen, Zhengqi
    Xie, Yuankun
    Liu, Yukun
    Wang, Xiaopeng
    Liu, Xuefei
    Li, Yongwei
    Tao, Jianhua
    Qi, Xin
    Lu, Yi
    Shi, Shuchen
    INTERSPEECH 2024, 2024, : 4773 - 4777
  • [7] Active Speaker Detection Using Audio-Visual Sensor Array
    Kheradiya, Jatin
    Reddy, Sandeep C.
    Hegde, Rajesh
    2014 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2014, : 480 - 484
  • [8] Active Speaker Detection Using Audio, Visual, and Depth Modalities: A Survey
    Robi, Siti Nur Aisyah Mohd
    Ariffin, Muhammad Atiff Zakwan Mohd
    Izhar, Mohd Azri Mohd
    Ahmad, Norulhusna
    Kaidi, Hazilah Mad
    IEEE ACCESS, 2024, 12 : 96617 - 96634
  • [9] Speaker position detection system using audio-visual information
    Matsuo, N
    Kitagawa, H
    Nagata, S
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1999, 35 (02): : 212 - 220
  • [10] Probabilistic Detection Methods for Acoustic Surveillance Using Audio Histograms
    Reddy, M. S. Shankar
    Nathwani, Karan
    Hegde, Rajesh M.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2015, 34 (06) : 1977 - 1992