A Mid-Level Representation for Melody-Based Retrieval in Audio Collections

被引:25
|
作者
Marolt, Matija [1 ]
机构
[1] Univ Ljubljana, Fac Comp & Informat Sci, Ljubljana 1000, Slovenia
关键词
Audio collections; information retrieval; melody; music;
D O I
10.1109/TMM.2008.2007293
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Searching audio collections using high-level musical descriptors is a difficult problem, due to the lack of reliable methods for extracting melody, harmony, rhythm, and other such descriptors from unstructured audio signals. In this paper, we present a novel approach to melody-based retrieval in audio collections. Our approach supports audio, as well as symbolic queries and ranks results according to melodic similarity to the query. We introduce a beat-synchronous melodic representation consisting of salient melodic lines, which are extracted from the analyzed audio signal. We propose the use of a 2-D shift-invariant transform to extract shift-invariant melodic fragments from the melodic representation and demonstrate how such fragments can be indexed and stored in a song database. An efficient search algorithm based on locality-sensitive hashing is used to perform retrieval according to similarity of melodic fragments. On the cover song detection task, good results are achieved for audio, as well as for symbolic queries, while fast retrieval performance makes the proposed system suitable for retrieval in large databases.
引用
收藏
页码:1617 / 1625
页数:9
相关论文
共 50 条
  • [21] Mining Multiple Queries for Image Retrieval: On-the-fly learning of an Object-specific Mid-level Representation
    Fernando, Basura
    Tuytelaars, Tinne
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 2544 - 2551
  • [22] An audio representation for content based retrieval
    Melih, K
    Gonzalez, R
    Ogunbona, P
    IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 207 - 210
  • [23] Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection
    Lim, Joseph J.
    Zitnick, C. Lawrence
    Dollar, Piotr
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3158 - 3165
  • [24] Mid-level Image Representation for Fruit Fly Identification (Diptera: Tephritidae)
    Leonardo, Matheus Macedo
    Avila, Sandra
    Zucchi, Roberto A.
    Faria, Fabio A.
    2017 IEEE 13TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2017, : 202 - 209
  • [25] Gameplay genre video classification by using mid-level video representation
    de Souza, Renato Augusto
    de Almeida, Raquel Pereira
    Moldovan, Arghir-Nicolae
    do Patrocinio, Zenilton Kleber G., Jr.
    Guimaraes, Silvio Jamil F.
    2016 29TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2016, : 188 - 194
  • [26] DEEP NEURAL NETWORK BASED LEARNING AND TRANSFERRING MID-LEVEL AUDIO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION
    Mun, Seongkyu
    Shon, Suwon
    Kim, Wooil
    Han, David K.
    Ko, Hanseok
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 796 - 800
  • [27] Assistive Image Comment Robot-A Novel Mid-Level Concept-Based Representation
    Chen, Yan-Ying
    Chen, Tao
    Liu, Taikun
    Liao, Hong-Yuan Mark
    Chang, Shih-Fu
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2015, 6 (03) : 298 - 311
  • [28] Instrument-specific harmonic atoms for mid-level music representation
    Leveau, Pierre
    Vincent, Emmanuel
    Richard, Gaeel
    Daudet, Laurent
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 116 - 128
  • [29] Unsupervised Deep Learning of Mid-Level Video Representation for Action Recognition
    Hou, Jingyi
    Wu, Xinxiao
    Chen, Jin
    Luo, Jiebo
    Jia, Yunde
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6910 - 6917
  • [30] Merging segmentations of low-level and mid-level time series for audio class discovery
    Radhakrishnan, Regunathan
    Divakaran, Ajay
    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 64 - +