Automatic detection of multi-speaker fragments with high time resolution

被引:4
|
作者
Kazimirova, E. [1 ]
Belyaev, A. [1 ,2 ]
机构
[1] Neurodatalab, Miami, FL 33137 USA
[2] Lomonosov MSU, Moscow, Russia
关键词
multi-speaker detection; convolutional neural network; harmonics analysis; audio segmentation; overlapped speech; interruption; conversational analysis; HISTOGRAM EQUALIZATION; SPEAKER DIARIZATION;
D O I
10.21437/Interspeech.2018-1878
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Interruptions and simultaneous talking represent important patterns of speech behavior. However, there is a lack of approaches to their automatic detection in continuous audio data. We have developed a solution for automatic labeling of multi speaker fragments using harmonic traces analysis. Since harmonic traces in multi-speaker intervals form an irregular pattern as opposed to the structured pattern typical for a single speaker, we resorted to computer vision methods to detect multi-speaker fragments. A convolutional neural network was trained on synthetic material to differentiate between single-speaker and multi speaker fragments. For evaluation of the proposed method the SSPNet Conflict Corpus with provided manual diarization was used. We also examined factors affecting algorithm performance. The main advantages of the proposed method are calculation simplicity and high time resolution. With our approach it is possible to detect segments with minimum duration of 0.5 seconds. The proposed method demonstrates highly accurate results and may be used for speech segmentation, speaker tracking, content analysis such as conflict detection, and other practical purposes.
引用
收藏
页码:1388 / 1392
页数:5
相关论文
共 50 条
  • [1] Automatic speaker clustering from multi-speaker utterances
    McLaughlin, J
    Reynolds, D
    Singer, E
    O'Leary, GC
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 817 - 820
  • [2] TIME DELAY DISTORTION IN MULTI-SPEAKER LOUDSPEAKER SYSTEMS
    GERSTEN, M
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1970, 18 (03): : 333 - &
  • [3] Speaker detection using multi-speaker audio files for both enrollment and test
    Bonastre, JF
    Meignier, S
    Merlin, T
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
  • [4] Automatic Transcription and Captioning System for Bahasa Indonesia in Multi-Speaker Environment
    Andra, Muhammad Bagus
    Usagawa, Tsuyoshi
    [J]. 2020 5TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS 2020), 2020, : 51 - 56
  • [5] Multi-Speaker Activity Detection using Zero Crossing Rate
    Ramaiah, V. Subba
    Rao, R. Rajeswara
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 23 - 26
  • [6] Memory Time Span in LSTMs for Multi-Speaker Source Separation
    Zegers, Jeroen
    Van Hamme, Hugo
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1477 - 1481
  • [7] Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms
    Zhao, Wei
    Xu, Li
    He, Ting
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7498 - 7503
  • [8] Multi-array multi-speaker tracking
    Potamitis, I
    Tremoulis, G
    Fakotakis, N
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 206 - 213
  • [9] MULTI-SPEAKER AND CONTEXT-INDEPENDENT ACOUSTICAL CUES FOR AUTOMATIC SPEECH RECOGNITION
    ROSSI, M
    NISHINUMA, Y
    MERCIER, G
    [J]. SPEECH COMMUNICATION, 1983, 2 (2-3) : 215 - 217
  • [10] A hybrid approach to speaker recognition in multi-speaker environment
    Trivedi, J
    Maitra, A
    Mitra, SK
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2005, 3776 : 272 - 275