Multi-Speaker Meeting Audio Segmentation

被引：0

作者：

Nwe, Tin Lay ^{[1
]}

Dong, Minghui ^{[1
]}

Khine, Swe Zin Kalayar ^{[1
]}

Li, Haizhou ^{[1
]}

机构：

[1] Inst Infocomm Res, Singapore 119613, Singapore

来源：

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年

关键词：

meeting transcription; BIC segmentation; bandpass filters; multi-pitch tracking; harmonic analysis;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents segmentation of multi-speaker meeting audio into four different classes: local speech, crosstalk, overlapped speech and non-speech sounds. Firstly, Bayesian Information Criterion (BIC) segmentation method is used to pre-segment the meeting according to speaker changing points. Then, harmonicity information is integrated into acoustic features to differentiate speech from non-speech audio segments. We use cascaded subband filters spread in pitch and harmonic frequency scales to characterize the harmonicity information. Finally, total energy and multi-pitch tracking algorithm arc used to classify speech segments into local speech, overlapped speech and crosstalk audio types. Experiments conducted on subset of ICSI meeting corpus shown promising results in classifying four audio types.

引用

页码：2522 / 2525

页数：4

共 50 条

[1] MULTI-SPEAKER TRACKING BY FUSING AUDIO AND VIDEO INFORMATION
Xiong, Zichao
Liu, Hongqing
Zhou, Yi
Luo, Zhen
[J]. 2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 321 - 325
[2] Audio segmentation and speaker localization in meeting videos
Vajaria, Himanshu
Islam, Tanmoy
Sarkar, Sudeep
Sankar, Ravi
Kasturi, Ranga
[J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 1150 - +
[3] Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries
Stafylakis, Themos
Mosner, Ladislav
Plchot, Oldrich
Rohdin, Johan
Silnova, Anna
Burget, Lukas
Cernocky, Jan Honza
[J]. INTERSPEECH 2022, 2022, : 605 - 609
[4] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
[J]. DIGITAL SIGNAL PROCESSING, 2024, 145
[5] Speaker detection using multi-speaker audio files for both enrollment and test
Bonastre, JF
Meignier, S
Merlin, T
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
[6] Multi-speaker DoA Estimation Using Audio and Visual Modality
Yulin Wu
Ruimin Hu
Xiaochen Wang
Shanfa Ke
[J]. Neural Processing Letters, 2023, 55 : 8887 - 8901
[7] Exploiting the Complementarity of Audio and Visual Data in Multi-Speaker Tracking
Ban, Yutong
Girin, Laurent
Alameda-Pineda, Xavier
Horaud, Radu
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 446 - 454
[8] Multi-speaker DoA Estimation Using Audio and Visual Modality
Wu, Yulin
Hu, Ruimin
Wang, Xiaochen
Ke, Shanfa
[J]. NEURAL PROCESSING LETTERS, 2023, 55 (07) : 8887 - 8901
[9] Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter
Zhao, Jinzheng
Wu, Peipei
Liu, Xubo
Goudarzi, Shidrokh
Liu, Haohe
Xu, Yong
Wang, Wenwu
[J]. INTERSPEECH 2022, 2022, : 3704 - 3708
[10] Audio-Visual Multi-Speaker Tracking Based On the GLMB Framework
Lin, Shoufeng
Qian, Xinyuan
[J]. INTERSPEECH 2020, 2020, : 3082 - 3086

← 1 2 3 4 5 →