Modeling Timbre Similarity of Short Music Clips

被引:3
|
作者
Siedenburg, Kai [1 ]
Mullensiefen, Daniel [2 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, Oldenburg, Germany
[2] Goldsmiths Univ London, Dept Psychol, London, England
来源
FRONTIERS IN PSYCHOLOGY | 2017年 / 8卷
关键词
short audio clips; music similarity; timbre; audio features; genre; LEAST-SQUARES REGRESSION; SOUNDS; DISSIMILARITY; DESCRIPTORS; RECOGNITION; EXCERPTS;
D O I
10.3389/fpsyg.2017.00639
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
There is evidence from a number of recent studies that most listeners are able to extract information related to song identity, emotion, or genre from music excerpts with durations in the range of tenths of seconds. Because of these very short durations, timbre as a multifaceted auditory attribute appears as a plausible candidate for the type of features that listeners make use of when processing short music excerpts. However, the importance of timbre in listening tasks that involve short excerpts has not yet been demonstrated empirically. Hence, the goal of this study was to develop a method that allows to explore to what degree similarity judgments of shortmusic clips can bemodeled with low-level acoustic features related to timbre. We utilized the similarity data from two large samples of participants: Sample I was obtained via an online survey, used 16 clips of 400 ms length, and contained responses of 137,339 participants. Sample II was collected in a lab environment, used 16 clips of 800 ms length, and contained responses from 648 participants. Our model used two sets of audio features which included commonly used timbre descriptors and the well-known Mel-frequency cepstral coefficients as well as their temporal derivates. In order to predict pairwise similarities, the resulting distances between clips in terms of their audio features were used as predictor variables with partial least-squares regression. We found that a sparse selection of three to seven features from both descriptor sets-mainly encoding the coarse shape of the spectrum as well as spectrotemporal variability-best predicted similarities across the two sets of sounds. Notably, the inclusion of non-acoustic predictors of musical genre and record release date allowed much better generalization performance and explained up to 50% of shared variance (R-2) between observations and model predictions. Overall, the results of this study empirically demonstrate that both acoustic features related to timbre as well as higher level categorical features such as musical genre play a major role in the perception of short music clips.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Perceptual Dimensions of Short Audio Clips and Corresponding Timbre Features
    Musil, Jason Jiri
    Elnusairi, Budr
    Mullensiefen, Daniel
    FROM SOUNDS TO MUSIC AND EMOTIONS, 2013, 7900 : 214 - 227
  • [2] SHORT-TERM RECOGNITION OF TIMBRE SEQUENCES: MUSIC TRAINING, PITCH VARIABILITY, AND TIMBRAL SIMILARITY
    Siedenburg, Kai
    McAdams, Stephen
    MUSIC PERCEPTION, 2018, 36 (01): : 24 - 39
  • [3] A model for rhythm and timbre similarity in electronic dance music
    Panteli, Maria
    Rocha, Bruno
    Bogaards, Niels
    Honingh, Aline
    MUSICAE SCIENTIAE, 2017, 21 (03) : 338 - 361
  • [4] Perception of Timbre and Rhythm Similarity in Electronic Dance Music
    Honingh, Aline
    Panteli, Maria
    Brockmeier, Thomas
    Mejia, David Inaki Lopez
    Sadakata, Makiko
    JOURNAL OF NEW MUSIC RESEARCH, 2015, 44 (04) : 373 - 390
  • [5] Perception of Timbre and Rhythm Similarity in Electronic Dance Music (vol 44, 2015)
    Honingh, A.
    Panteli, M.
    Brockmeier, T.
    Mejia, Inaki Lopez D.
    Sadakata, M.
    JOURNAL OF NEW MUSIC RESEARCH, 2015, 44 (04)
  • [6] Modeling timbre distance with temporal statistics from polyphonic music
    Mörchen, F
    Ultsch, A
    Thies, M
    Löhken, I
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 81 - 90
  • [7] ENHANCING TIMBRE MODEL USING MFCC AND ITS TIME DERIVATIVES FOR MUSIC SIMILARITY ESTIMATION
    de Leon, Franz
    Martinez, Kirk
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2005 - 2009
  • [8] A Modeling of Singing Voice Robust to Accompaniment Sounds and Its Application to Singer Identification and Vocal-Timbre-Similarity-Based Music Information Retrieval
    Fujihara, Hiromasa
    Goto, Masataka
    Kitahara, Tetsuro
    Okuno, Hiroshi G.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03): : 638 - 648
  • [9] Timbre Preferences in the Context of Mixing Music
    Dobrowohl, Felix A.
    Milne, Andrew J.
    Dean, Roger T.
    APPLIED SCIENCES-BASEL, 2019, 9 (08):
  • [10] Towards a theory of timbre for music analysis
    Tsang, L
    MUSICAE SCIENTIAE, 2002, 6 (01) : 23 - 52