Candidate Speech Extraction from Multi-speaker Single-Channel Audio Interviews

被引:0
|
作者
Pandharipande, Meghna [1 ]
Kopparapu, Sunil Kumar [1 ]
机构
[1] Tata Consultancy Serv Ltd, TCS Res, Mumbai, Maharashtra, India
来源
关键词
Speaker diarization; Binary classifier; Candidates speech; Late fusion; SPEAKER DIARIZATION;
D O I
10.1007/978-3-031-48309-7_18
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Video interviews are increasingly common, initially due to travel restrictions during the pandemic and now extended for economic reasons. However, in many developing nations, limited bandwidth makes video interviews impractical, leading to a preference for telephonic interviews. These interviews are often recorded for audit and analysis purposes. A candidate's performance in an interview depends not only on their knowledge but also on how they respond to the interviewer. Both the content of their responses and their communication skills influence the overall selection process. For any downstream interview analysis or automation, a reliable method is needed to identify speech segments spoken by the candidate in a multi-speaker, single channel interview conversation. In this paper, we propose a pipeline to accurately identify candidate speech segments. This pre-processing step is crucial for analyzing various aspects of a candidate's interview performance, such as answer analysis, confidence level, and emotional expression.
引用
收藏
页码:210 / 221
页数:12
相关论文
共 50 条
  • [1] SOURCE-AWARE CONTEXT NETWORK FOR SINGLE-CHANNEL MULTI-SPEAKER SPEECH SEPARATION
    Li, Zeng-Xi
    Song, Yan
    Dai, Li-Rong
    McLoughlin, Ian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 681 - 685
  • [2] Single-Channel Multi-Speaker Separation using Deep Clustering
    Isik, Yusuf
    Le Roux, Jonathan
    Chen, Zhuo
    Watanabe, Shinji
    Hershey, John R.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 545 - 549
  • [3] Single-speaker/multi-speaker co-channel speech classification
    Rossignol, Stephane
    Pietquini, Olivier
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2322 - 2325
  • [4] Speaker Separation Using Visual Speech Features and Single-channel Audio
    Khan, Faheem
    Milner, Ben
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3263 - 3267
  • [5] Speaker Distance Estimation in Enclosures From Single-Channel Audio
    Neri, Michael
    Politis, Archontis
    Krause, Daniel Aleksander
    Carli, Marco
    Virtanen, Tuomas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 (2242-2254) : 2242 - 2254
  • [6] SINGLE-CHANNEL SPEECH EXTRACTION USING SPEAKER INVENTORY AND ATTENTION NETWORK
    Xiao, Xiong
    Chen, Zhuo
    Yoshioka, Takuya
    Erdogan, Hakan
    Liu, Changliang
    Dimitriadis, Dimitrios
    Droppo, Jasha
    Gong, Yifan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 86 - 90
  • [7] Robust Speaker Recognition Based on Single-Channel and Multi-Channel Speech Enhancement
    Taherian, Hassan
    Wang, Zhong-Qiu
    Chang, Jorge
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1293 - 1302
  • [8] Multi-Speaker Meeting Audio Segmentation
    Nwe, Tin Lay
    Dong, Minghui
    Khine, Swe Zin Kalayar
    Li, Haizhou
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2522 - 2525
  • [9] Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
    Fujita, Kenichi
    Ando, Atsushi
    Ijima, Yusuke
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (01) : 93 - 104
  • [10] A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin
    Yu, Jun
    Su, Rongfeng
    Wang, Lan
    Zhou, Wenpeng
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,