Singer Diarization for Polyphonic Music With Unison Singing

被引：1

作者：

Suda, Hitoshi ^{[1
]}

Saito, Daisuke ^{[1
]}

Fukayama, Satoru ^{[2
]}

Nakano, Tomoyasu ^{[2
]}

Goto, Masataka ^{[2
]}

机构：

[1] Univ Tokyo, Dept Engn, Bunkyo Ku, Tokyo 1138656, Japan

[2] Natl Inst Adv Ind Sci & Technol, Tsukuba, Ibaraki 3058568, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2022年 / 30卷

关键词：

Feature extraction; Data mining; Synchronization; Information processing; Voice activity detection; Timbre; Speech analysis; Music information processing; music information retrieval; singer diarization; unison singing; SPEAKER DIARIZATION; DATABASE;

D O I：

10.1109/TASLP.2022.3166262

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper introduces a new framework for singer diarization, which is a technique to reveal who sings when in songs with multiple singers. Although various techniques have been developed to analyze and extract features of singing voices in musical audio signals, most of them assume that a song is sung by a single singer, and singer diarization for multiple singers has not been well studied in the field of singing information processing. To deal with multiple speakers in speech analysis, speaker diarization has been explored to handle overlapped speech voices, but cannot handle singing voices well because of acoustic differences between singing and speech voices. This paper therefore proposes a new diarization framework specialized in singing voices. To achieve high accuracy in overlap detection, this paper proposes a novel acoustic feature named Cosacorr score, which is helpful in estimating whether a song is sung by more than one singer. After extracting singing voices from polyphonic music by using a singing voice separation technique, the framework adopts an existing ArcFace technique to extract discriminative singer representations from short segments of the separated singing voices. The framework is evaluated by using a new private dataset of unison singing voices, which is constructed using commercially available compact discs (CDs). The experimental results show that the proposed framework outperformed the baseline method for speaker diarization in terms of diarization error rate (DER).

引用

页码：1531 / 1545

页数：15

共 50 条

[21] CHILDRENS SINGING ACCURACY AS A FUNCTION OF GRADE LEVEL, GENDER, AND INDIVIDUAL VERSUS UNISON SINGING
COOPER, NA
JOURNAL OF RESEARCH IN MUSIC EDUCATION, 1995, 43 (03) : 222 - 231
[22] AUTOMATIC LYRICS-TO-AUDIO ALIGNMENT ON POLYPHONIC MUSIC USING SINGING-ADAPTED ACOUSTIC MODELS
Sharma, Bidisha
Gupta, Chitralekha
Li, Haizhou
Wang, Ye
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 396 - 400
[23] Detecting pitch of singing voice in polyphonic audio
Li, YP
Wang, DL
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 17 - 20
[24] GANGA: ETYMOLOGY OF THE POLYPHONIC SINGING IN DALMATIAN ZAGORA
Lorger, Srecko
ANALI ZAVODA ZA POVIJESNE ZNANOSTI HRVATSKE AKADEMIJE ZNANOSTI I UMJETNOSTI U DUBROVNIKU, 2013, 51 (02) : 501 - 539
[25] Let us be painting painting painter singing singing singer
Henry, Gordon, Jr.
POETRY, 2018, 212 (03) : 267 - 267
[26] The "Second manner of singing in unison": Vincenzo Galilei and the emancipation of consonance
Fiorentino, Giuseppe
STUDI MUSICALI-NUOVA SERIE, 2012, (02): : 397 - 424
[27] Polyphonic Minds: Music of the Hemispheres
Danielson, Janet
JOURNAL FOR THE HISTORY OF ASTRONOMY, 2019, 50 (01) : 103 - 105
[28] Old Music in polyphonic Departure
Hoepfel, Jutta
OSTERREICHISCHE MUSIKZEITSCHRIFT, 2013, 68 (05): : 78 - 79
[29] Polyphonic Minds: Music of the Hemispheres
Zayaruznaya, Anna
JOURNAL OF MUSIC THEORY, 2019, 63 (01) : 153 - 163
[30] Distributed polyphonic music synthesis
Williams, J
Clement, MJ
SIXTH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, 1997, : 20 - 29

← 1 2 3 4 5 →