Efficient audio-driven multimedia indexing through similarity-based speech/music discrimination

被引:18
|
作者
Tsipas, Nikolaos [1 ]
Vrysis, Lazaros [1 ]
Dimoulas, Charalampos [1 ]
Papanikolaou, George [1 ]
机构
[1] Aristotle Univ Thessaloniki, Thessaloniki 54124, Greece
关键词
Speech/music discrimination; Self-similarity matrix analysis; Transition point detection; Supervised learning; MUSIC;
D O I
10.1007/s11042-016-4315-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, an audio-driven algorithm for the detection of speech and music events in multimedia content is introduced. The proposed approach is based on the hypothesis that short-time frame-level discrimination performance can be enhanced by identifying transition points between longer, semantically homogeneous segments of audio. In this context, a two-step segmentation approach is employed in order to initially identify transition points between the homogeneous regions and subsequently classify the derived segments using a supervised binary classifier. The transition point detection mechanism is based on the analysis and composition of multiple self-similarity matrices, generated using different audio feature sets. The implemented technique aims at discriminating events focusing on transition point detection with high temporal resolution, a target that is also reflected in the adopted assessment methodology. Thereafter, multimedia indexing can be efficiently deployed (for both audio and video sequences), incorporating the processes of high resolution temporal segmentation and semantic annotation extraction. The system is evaluated against three publicly available datasets and experimental results are presented in comparison with existing implementations. The proposed algorithm is provided as an open source software package in order to support reproducible research and encourage collaboration in the field.
引用
收藏
页码:25603 / 25621
页数:19
相关论文
共 11 条
  • [1] Efficient audio-driven multimedia indexing through similarity-based speech / music discrimination
    Nikolaos Tsipas
    Lazaros Vrysis
    Charalampos Dimoulas
    George Papanikolaou
    [J]. Multimedia Tools and Applications, 2017, 76 : 25603 - 25621
  • [2] Speech-Music-Noise Discrimination in Sound Indexing of Multimedia Documents
    Bouafif, Lamia
    Ellouze, Noureddine
    [J]. SOUND AND VIBRATION, 2018, 52 (06): : 2 - 10
  • [3] Speech/Music Discrimination using Hybrid-Based Feature Extraction for Audio Data Indexing
    Wang, Kun-Ching
    Yang, Yung-Ming
    Yang, Ying-Ru
    [J]. 2017 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2017, : 515 - 519
  • [4] Web-based live speech-driven lip-sync An audio-driven rule-based approach
    Llorach, Gerard
    Evans, Alun
    Blat, Josep
    Grimm, Giso
    Hohmann, Volker
    [J]. 2016 8TH INTERNATIONAL CONFERENCE ON GAMES AND VIRTUAL WORLDS FOR SERIOUS APPLICATIONS (VS-GAMES), 2016,
  • [5] Expert system for intelligent audio codification based in speech/music discrimination
    Exposito, J. E. Munoz
    Galan, S. Garcia
    Reyes, N. Ruiz
    Candeas, P. Vera
    Pena, F. Rivas
    [J]. 2006 INTERNATIONAL SYMPOSIUM ON EVOLVING FUZZY SYSTEMS, PROCEEDINGS, 2006, : 318 - +
  • [6] Interoperable Multimedia Metadata through Similarity-Based Semantic Web Service Discovery
    Dietze, Stefan
    Benn, Neil
    Domingue, John
    Conconi, Alex
    Cattaneo, Fabio
    [J]. SEMANTIC MULTIMEDIA, PROCEEDINGS, 2009, 5887 : 77 - +
  • [7] Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources
    Mubarak, OM
    Ambikairajah, E
    Epps, J
    [J]. ISSPA 2005: THE 8TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2005, : 619 - 622
  • [8] Speech/music discrimination-based audio characterization using blind watermarking scheme
    Mezghani, Eya
    Charfeddine, Maha
    Nicolas, Henri
    Ben Amar, Chokri
    [J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2016, 11 (06): : 311 - 321
  • [9] SPEECH/MUSIC DISCRIMINATION BASED ON WARPING TRANSFORMATION AND FUZZY LOGIC FOR INTELLIGENT AUDIO CODING
    Enrique Munoz-Exposito, Jose
    Garcia Galan, Sebastian
    Ruiz Reyes, Nicolas
    Vera Candeas, Pedro
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2009, 23 (05) : 427 - 442
  • [10] An RNN-Based Speech-Music Discrimination Used for Hybrid Audio Coder
    Yang, Wanzhao
    Tu, Weiping
    Zheng, Jiaxi
    Zhang, Xiong
    Yang, Yuhong
    Song, Yucheng
    [J]. MULTIMEDIA MODELING, MMM 2018, PT I, 2018, 10704 : 81 - 92