Speaker Detection in Audio Stream via Probabilistic Prediction Using Generalized GEBI

被引：1

作者：

Sakata, Koki ^{[1
]}

Sakashita, Shota ^{[1
]}

Matsuo, Kazuya ^{[1
]}

Kurogi, Shuichi ^{[1
]}

机构：

[1] Kyushu Inst Technol, Kitakyushu, Fukuoka 8048550, Japan

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2016, PT IV | 2016年 / 9950卷

关键词：

Probabilistic prediction; Speaker detection; Generalized Gibbs-distribution-based extended Bayesian inference;

D O I：

10.1007/978-3-319-46681-1_37

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a method of speaker detection using probabilistic prediction for avoiding the tuning of thresholds to detect a speaker in an audio stream. We introduce g-GEBI (generalized GEBI) as a generalization of BI (Bayesian Inference) and GEBI (Gibbsdistribution- based Extended BI) to execute iterative detection of a speaker in audio stream uttered by more than one speaker. Then, we show a method of probabilistic prediction in multiclass classification to classify the results of speaker detection. By means of numerical experiments using recorded real speech data, we examine the properties and the effectiveness of the present method. Especially, we show that g-GEBI and g-BI (generalized BI) are more effective than the conventional BI and GEBI in incremental speaker detection task.

引用

页码：302 / 311

页数：10

共 50 条

[21] Robust Audio-Visual Speech Synchrony Detection by Generalized Bimodal Linear Prediction
Kumar, Kshitiz
Navratil, Jiri
Marcheret, Etienne
Libal, Vit
Potamianos, Gerasimos
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2219 - +
[22] USING ENHANCED F0-TRAJECTORIES FOR MULTIPLE SPEAKER DETECTION IN AUDIO MONITORING SCENARIOS
Cornaggia-Urrigshardt, Alessia
Kurth, Frank
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1093 - 1097
[23] Data Augmentation Using McAdams-Coefficient-Based Speaker Anonymization for Fake Audio Detection
Li, Kai
Li, Sheng
Lu, Xugang
Akagi, Masato
Liu, Meng
Zhang, Lin
Zeng, Chang
Wang, Longbiao
Dang, Jianwu
Unoki, Masashi
INTERSPEECH 2022, 2022, : 664 - 668
[24] Video based online behavior detection using probabilistic multi stream fusion
Arsic, D
Wallhoff, F
Schuller, B
Rigoll, G
2005 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), VOLS 1-5, 2005, : 2041 - 2044
[25] Video based online behavior detection using probabilistic multi stream fusion
Arsic, D
Wallhoff, F
Schuller, B
Rigoll, G
2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 1355 - 1358
[26] Speaker Verification via Estimating Total Variability Space Using Probabilistic Partial Least Squares
Chen, Chen
Han, Jiqing
Pan, Yilin
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1537 - 1541
[27] WLAN Channel Status Duration Prediction for Audio and Video Services Using Probabilistic Neural Networks
Hou, Yafei
Denno, Satoshi
IEEE Access, 2024, 12 : 28201 - 28211
[28] WLAN Channel Status Duration Prediction for Audio and Video Services Using Probabilistic Neural Networks
Hou, Yafei
Denno, Satoshi
IEEE ACCESS, 2024, 12 : 28201 - 28211
[29] Audio-visual speaker identification via adaptive fusion using reliability estimates of both modalities
Fox, NA
O'Mullane, BA
Reilly, RB
AUDIO AND VIDEO BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2005, 3546 : 787 - 796
[30] Optimized technique for speaker changes detection in multispeaker audio recording using pyknogram and efficient distance metric
Kaur, Sukhvinder
Prabha, Chander
Singh, Ravinder Pal
Gupta, Deepali
Juneja, Sapna
Gupta, Punit
Nauman, Ali
PLOS ONE, 2024, 19 (11):

← 1 2 3 4 5 →