Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI

被引：1

作者：

Sakata, Koki ^{[1
]}

Sakashita, Shota ^{[1
]}

Matsuo, Kazuya ^{[1
]}

Kurogi, Shuichi ^{[1
]}

机构：

[1] Kyushu Inst Technol, Kitakyushu, Fukuoka 8048550, Japan

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV | 2016年 / 9950卷

关键词：

Probabilistic prediction; Speaker detection; Generalized Gibbs-distribution-based extended Bayesian inference;

D O I：

10.1007/978-3-319-46681-1_37

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a method of speaker detection using probabilistic prediction for avoiding the tuning of thresholds to detect a speaker in an audio stream. We introduce g-GEBI (generalized GEBI) as a generalization of BI (Bayesian Inference) and GEBI (Gibbsdistribution- based Extended BI) to execute iterative detection of a speaker in audio stream uttered by more than one speaker. Then, we show a method of probabilistic prediction in multiclass classification to classify the results of speaker detection. By means of numerical experiments using recorded real speech data, we examine the properties and the effectiveness of the present method. Especially, we show that g-GEBI and g-BI (generalized BI) are more effective than the conventional BI and GEBI in incremental speaker detection task.

引用

页码：302 / 311

页数：10

共 50 条

[1] Audio-visual speaker recognition using time-varying stream reliability prediction
Chaudhari, UV
Ramaswamy, GN
Potamianos, G
Neti, C
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 712 - 715
[2] AUDIO INPUTS FOR ACTIVE SPEAKER DETECTION AND LOCALIZATION VIA MICROPHONE ARRAY
Berghi, Davide
Jackson, Philip J. B.
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
[3] Speaker detection using multi-speaker audio files for both enrollment and test
Bonastre, JF
Meignier, S
Merlin, T
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
[4] Unsupervised speaker change detection using Probabilistic pattern matching
Malegaonkar, A.
Ariyaeeinia, A.
Sivakumaran, P.
Fortuna, J.
IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (08) : 509 - 512
[5] Detecting paralinguistic events in audio stream using context in features and probabilistic decisions
Gupta, Rahul
Audhkhasi, Kartik
Lee, Sungbok
Narayanan, Shrikanth
COMPUTER SPEECH AND LANGUAGE, 2016, 36 : 72 - 92
[6] Generalized Fake Audio Detection via Deep Stable Learning
Wang, Zhiyong
Fu, Ruibo
Wen, Zhengqi
Xie, Yuankun
Liu, Yukun
Wang, Xiaopeng
Liu, Xuefei
Li, Yongwei
Tao, Jianhua
Qi, Xin
Lu, Yi
Shi, Shuchen
INTERSPEECH 2024, 2024, : 4773 - 4777
[7] Active Speaker Detection Using Audio-Visual Sensor Array
Kheradiya, Jatin
Reddy, Sandeep C.
Hegde, Rajesh
2014 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2014, : 480 - 484
[8] Active Speaker Detection Using Audio, Visual, and Depth Modalities: A Survey
Robi, Siti Nur Aisyah Mohd
Ariffin, Muhammad Atiff Zakwan Mohd
Izhar, Mohd Azri Mohd
Ahmad, Norulhusna
Kaidi, Hazilah Mad
IEEE ACCESS, 2024, 12 : 96617 - 96634
[9] Speaker position detection system using audio-visual information
Matsuo, N
Kitagawa, H
Nagata, S
FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 1999, 35 (02): : 212 - 220
[10] Probabilistic Detection Methods for Acoustic Surveillance Using Audio Histograms
Reddy, M. S. Shankar
Nathwani, Karan
Hegde, Rajesh M.
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2015, 34 (06) : 1977 - 1992

← 1 2 3 4 5 →