Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI

被引：1

作者：

Sakata, Koki ^{[1
]}

Sakashita, Shota ^{[1
]}

Matsuo, Kazuya ^{[1
]}

Kurogi, Shuichi ^{[1
]}

机构：

[1] Kyushu Inst Technol, Kitakyushu, Fukuoka 8048550, Japan

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV | 2016年 / 9950卷

关键词：

Probabilistic prediction; Speaker detection; Generalized Gibbs-distribution-based extended Bayesian inference;

D O I：

10.1007/978-3-319-46681-1_37

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a method of speaker detection using probabilistic prediction for avoiding the tuning of thresholds to detect a speaker in an audio stream. We introduce g-GEBI (generalized GEBI) as a generalization of BI (Bayesian Inference) and GEBI (Gibbsdistribution- based Extended BI) to execute iterative detection of a speaker in audio stream uttered by more than one speaker. Then, we show a method of probabilistic prediction in multiclass classification to classify the results of speaker detection. By means of numerical experiments using recorded real speech data, we examine the properties and the effectiveness of the present method. Especially, we show that g-GEBI and g-BI (generalized BI) are more effective than the conventional BI and GEBI in incremental speaker detection task.

引用

页码：302 / 311

页数：10

共 50 条

[31] Deepfake Audio Detection via MFCC Features Using Machine Learning
Hamza, Ameer
Javed, Abdul Rehman
Iqbal, Farkhund
Kryvinska, Natalia
Almadhor, Ahmad S. S.
Jalil, Zunera
Borghol, Rouba
IEEE ACCESS, 2022, 10 : 134018 - 134028
[32] Deepfake Audio Detection via MFCC Features Using Machine Learning
Hamza, Ameer
Javed, Abdul Rehman Rehman
Iqbal, Farkhund
Kryvinska, Natalia
Almadhor, Ahmad S.
Jalil, Zunera
Borghol, Rouba
IEEE Access, 2022, 10 : 134018 - 134028
[33] Shorter latency of real-time epileptic seizure detection via probabilistic prediction
Xu, Yankun
Yang, Jie
Ming, Wenjie
Wang, Shuang
Sawan, Mohamad
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 236
[34] Audio coding for representation in MIDI via pitch detection using harmonic dictionaries
Sieger, NJ
Tewfik, AH
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 1998, 20 (1-2): : 45 - 59
[35] Audio coding for representation in MIDI via pitch detection using harmonic dictionaries
J VLSI Signal Process Syst Signal Image Video Technol, 1-2 (45-59):
[36] Audio Coding for Representation in MIDI via Pitch Detection Using Harmonic Dictionaries
Nicholas J. Sieger
Ahmed H. Tewfik
Journal of VLSI signal processing systems for signal, image and video technology, 1998, 20 : 45 - 59
[37] Fault Detection using Probabilistic Prediction and Data Fusion on a Bulk Good System
Arevalo, Fernando
Mohammed, Tariq
Schwung, Andreas
2017 52ND INTERNATIONAL UNIVERSITIES POWER ENGINEERING CONFERENCE (UPEC), 2017,
[38] Audio-Visual Speech Synchronization Detection Using a Bimodal Linear Prediction Model
Kumar, Kshitiz
Navratil, Jiri
Marcheret, Etienne
Libal, Vit
Ramaswamy, Ganesh
Potamianos, Gerasimos
2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 670 - +
[39] Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Fan, Zhiyun
Liang, Zhenlin
Dong, Linhao
Liu, Yi
Zhou, Shiyu
Cai, Meng
Zhang, Jun
Ma, Zejun
Xu, Bo
INTERSPEECH 2022, 2022, : 3749 - 3753
[40] Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection
Li, Yanxiong
Zhang, Xue
Jin, Hai
Li, Xianku
Wang, Qin
He, Qianhua
Huang, Qian
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (01) : 897 - 916

← 1 2 3 4 5 →