A Mid-Level Representation for Melody-Based Retrieval in Audio Collections

被引:25
|
作者
Marolt, Matija [1 ]
机构
[1] Univ Ljubljana, Fac Comp & Informat Sci, Ljubljana 1000, Slovenia
关键词
Audio collections; information retrieval; melody; music;
D O I
10.1109/TMM.2008.2007293
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Searching audio collections using high-level musical descriptors is a difficult problem, due to the lack of reliable methods for extracting melody, harmony, rhythm, and other such descriptors from unstructured audio signals. In this paper, we present a novel approach to melody-based retrieval in audio collections. Our approach supports audio, as well as symbolic queries and ranks results according to melodic similarity to the query. We introduce a beat-synchronous melodic representation consisting of salient melodic lines, which are extracted from the analyzed audio signal. We propose the use of a 2-D shift-invariant transform to extract shift-invariant melodic fragments from the melodic representation and demonstrate how such fragments can be indexed and stored in a song database. An efficient search algorithm based on locality-sensitive hashing is used to perform retrieval according to similarity of melodic fragments. On the cover song detection task, good results are achieved for audio, as well as for symbolic queries, while fast retrieval performance makes the proposed system suitable for retrieval in large databases.
引用
收藏
页码:1617 / 1625
页数:9
相关论文
共 50 条
  • [41] SuperPixel based mid-level image description for image recognition
    Tasli, H. Emrah
    Sicre, Ronan
    Gevers, Theo
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2015, 33 : 301 - 308
  • [42] SuperPixel based Angular Differences as a mid-level Image Descriptor
    Sicre, Ronan
    Tasli, H. Emrah
    Gevers, Theo
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 3732 - 3737
  • [43] Investigation on effectiveness of mid-level feature representation for semantic boundary detection in news video
    Radhakrishan, R
    Xiong, Z
    Divakaran, A
    Raj, B
    INTERNET MULTIMEDIA MANAGEMENT SYSTEMS IV, 2003, 5242 : 74 - 80
  • [44] Mid-Level Feature Representation via Sparse Autoencoder for Remotely Sensed Scene Classification
    Li, Erzhu
    Du, Peijun
    Samat, Alim
    Meng, Yaping
    Che, Meiqin
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (03) : 1068 - 1081
  • [45] Strokelets: A Learned Multi-Scale Mid-Level Representation for Scene Text Recognition
    Bai, Xiang
    Yao, Cong
    Liu, Wenyu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (06) : 2789 - 2802
  • [46] Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time
    Lee, Yong Jae
    Efros, Alexei A.
    Hebert, Martial
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 1857 - 1864
  • [47] A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material
    Esra Acar
    Frank Hopfgartner
    Sahin Albayrak
    Multimedia Tools and Applications, 2017, 76 : 11809 - 11837
  • [48] A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material
    Acar, Esra
    Hopfgartner, Frank
    Albayrak, Sahin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (09) : 11809 - 11837
  • [49] Lyrics-based audio retrieval and multimodal navigation in music collections
    Mueller, Meinard
    Kurth, Frank
    Damm, David
    Fremerey, Christian
    Clausen, Michael
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 2007, 4675 : 112 - +
  • [50] Image understanding systems based on the unifying representation of perceptual and conceptual information and the solution of mid-level and high-level vision problems
    Kuvychko, I
    INTELLIGENT ROBOTS AND COMPUTER VISION XX: ALGORITHMS, TECHNIQUES, AND ACTIVE VISION, 2001, 4572 : 247 - 258