Multi-Speaker Meeting Audio Segmentation

被引:0
|
作者
Nwe, Tin Lay [1 ]
Dong, Minghui [1 ]
Khine, Swe Zin Kalayar [1 ]
Li, Haizhou [1 ]
机构
[1] Inst Infocomm Res, Singapore 119613, Singapore
关键词
meeting transcription; BIC segmentation; bandpass filters; multi-pitch tracking; harmonic analysis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents segmentation of multi-speaker meeting audio into four different classes: local speech, crosstalk, overlapped speech and non-speech sounds. Firstly, Bayesian Information Criterion (BIC) segmentation method is used to pre-segment the meeting according to speaker changing points. Then, harmonicity information is integrated into acoustic features to differentiate speech from non-speech audio segments. We use cascaded subband filters spread in pitch and harmonic frequency scales to characterize the harmonicity information. Finally, total energy and multi-pitch tracking algorithm arc used to classify speech segments into local speech, overlapped speech and crosstalk audio types. Experiments conducted on subset of ICSI meeting corpus shown promising results in classifying four audio types.
引用
收藏
页码:2522 / 2525
页数:4
相关论文
共 50 条
  • [1] MULTI-SPEAKER TRACKING BY FUSING AUDIO AND VIDEO INFORMATION
    Xiong, Zichao
    Liu, Hongqing
    Zhou, Yi
    Luo, Zhen
    [J]. 2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 321 - 325
  • [2] Audio segmentation and speaker localization in meeting videos
    Vajaria, Himanshu
    Islam, Tanmoy
    Sarkar, Sudeep
    Sankar, Ravi
    Kasturi, Ranga
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 1150 - +
  • [3] Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries
    Stafylakis, Themos
    Mosner, Ladislav
    Plchot, Oldrich
    Rohdin, Johan
    Silnova, Anna
    Burget, Lukas
    Cernocky, Jan Honza
    [J]. INTERSPEECH 2022, 2022, : 605 - 609
  • [4] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    [J]. DIGITAL SIGNAL PROCESSING, 2024, 145
  • [5] Speaker detection using multi-speaker audio files for both enrollment and test
    Bonastre, JF
    Meignier, S
    Merlin, T
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
  • [6] Multi-speaker DoA Estimation Using Audio and Visual Modality
    Yulin Wu
    Ruimin Hu
    Xiaochen Wang
    Shanfa Ke
    [J]. Neural Processing Letters, 2023, 55 : 8887 - 8901
  • [7] Exploiting the Complementarity of Audio and Visual Data in Multi-Speaker Tracking
    Ban, Yutong
    Girin, Laurent
    Alameda-Pineda, Xavier
    Horaud, Radu
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 446 - 454
  • [8] Multi-speaker DoA Estimation Using Audio and Visual Modality
    Wu, Yulin
    Hu, Ruimin
    Wang, Xiaochen
    Ke, Shanfa
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (07) : 8887 - 8901
  • [9] Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter
    Zhao, Jinzheng
    Wu, Peipei
    Liu, Xubo
    Goudarzi, Shidrokh
    Liu, Haohe
    Xu, Yong
    Wang, Wenwu
    [J]. INTERSPEECH 2022, 2022, : 3704 - 3708
  • [10] Audio-Visual Multi-Speaker Tracking Based On the GLMB Framework
    Lin, Shoufeng
    Qian, Xinyuan
    [J]. INTERSPEECH 2020, 2020, : 3082 - 3086