Singer Diarization for Polyphonic Music With Unison Singing

被引:1
|
作者
Suda, Hitoshi [1 ]
Saito, Daisuke [1 ]
Fukayama, Satoru [2 ]
Nakano, Tomoyasu [2 ]
Goto, Masataka [2 ]
机构
[1] Univ Tokyo, Dept Engn, Bunkyo Ku, Tokyo 1138656, Japan
[2] Natl Inst Adv Ind Sci & Technol, Tsukuba, Ibaraki 3058568, Japan
关键词
Feature extraction; Data mining; Synchronization; Information processing; Voice activity detection; Timbre; Speech analysis; Music information processing; music information retrieval; singer diarization; unison singing; SPEAKER DIARIZATION; DATABASE;
D O I
10.1109/TASLP.2022.3166262
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper introduces a new framework for singer diarization, which is a technique to reveal who sings when in songs with multiple singers. Although various techniques have been developed to analyze and extract features of singing voices in musical audio signals, most of them assume that a song is sung by a single singer, and singer diarization for multiple singers has not been well studied in the field of singing information processing. To deal with multiple speakers in speech analysis, speaker diarization has been explored to handle overlapped speech voices, but cannot handle singing voices well because of acoustic differences between singing and speech voices. This paper therefore proposes a new diarization framework specialized in singing voices. To achieve high accuracy in overlap detection, this paper proposes a novel acoustic feature named Cosacorr score, which is helpful in estimating whether a song is sung by more than one singer. After extracting singing voices from polyphonic music by using a singing voice separation technique, the framework adopts an existing ArcFace technique to extract discriminative singer representations from short segments of the separated singing voices. The framework is evaluated by using a new private dataset of unison singing voices, which is constructed using commercially available compact discs (CDs). The experimental results show that the proposed framework outperformed the baseline method for speaker diarization in terms of diarization error rate (DER).
引用
收藏
页码:1531 / 1545
页数:15
相关论文
共 50 条