Efficient audio-driven multimedia indexing through similarity-based speech/music discrimination

被引：18

作者：

Tsipas, Nikolaos ^{[1
]}

Vrysis, Lazaros ^{[1
]}

Dimoulas, Charalampos ^{[1
]}

Papanikolaou, George ^{[1
]}

机构：

[1] Aristotle Univ Thessaloniki, Thessaloniki 54124, Greece

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2017年 / 76卷 / 24期

关键词：

Speech/music discrimination; Self-similarity matrix analysis; Transition point detection; Supervised learning; MUSIC;

D O I：

10.1007/s11042-016-4315-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, an audio-driven algorithm for the detection of speech and music events in multimedia content is introduced. The proposed approach is based on the hypothesis that short-time frame-level discrimination performance can be enhanced by identifying transition points between longer, semantically homogeneous segments of audio. In this context, a two-step segmentation approach is employed in order to initially identify transition points between the homogeneous regions and subsequently classify the derived segments using a supervised binary classifier. The transition point detection mechanism is based on the analysis and composition of multiple self-similarity matrices, generated using different audio feature sets. The implemented technique aims at discriminating events focusing on transition point detection with high temporal resolution, a target that is also reflected in the adopted assessment methodology. Thereafter, multimedia indexing can be efficiently deployed (for both audio and video sequences), incorporating the processes of high resolution temporal segmentation and semantic annotation extraction. The system is evaluated against three publicly available datasets and experimental results are presented in comparison with existing implementations. The proposed algorithm is provided as an open source software package in order to support reproducible research and encourage collaboration in the field.

引用

页码：25603 / 25621

页数：19

共 11 条

[1] Efficient audio-driven multimedia indexing through similarity-based speech / music discrimination
Nikolaos Tsipas
Lazaros Vrysis
Charalampos Dimoulas
George Papanikolaou
[J]. Multimedia Tools and Applications, 2017, 76 : 25603 - 25621
[2] Speech-Music-Noise Discrimination in Sound Indexing of Multimedia Documents
Bouafif, Lamia
Ellouze, Noureddine
[J]. SOUND AND VIBRATION, 2018, 52 (06): : 2 - 10
[3] Speech/Music Discrimination using Hybrid-Based Feature Extraction for Audio Data Indexing
Wang, Kun-Ching
Yang, Yung-Ming
Yang, Ying-Ru
[J]. 2017 INTERNATIONAL CONFERENCE ON SYSTEM SCIENCE AND ENGINEERING (ICSSE), 2017, : 515 - 519
[4] Web-based live speech-driven lip-sync An audio-driven rule-based approach
Llorach, Gerard
Evans, Alun
Blat, Josep
Grimm, Giso
Hohmann, Volker
[J]. 2016 8TH INTERNATIONAL CONFERENCE ON GAMES AND VIRTUAL WORLDS FOR SERIOUS APPLICATIONS (VS-GAMES), 2016,
[5] Expert system for intelligent audio codification based in speech/music discrimination
Exposito, J. E. Munoz
Galan, S. Garcia
Reyes, N. Ruiz
Candeas, P. Vera
Pena, F. Rivas
[J]. 2006 INTERNATIONAL SYMPOSIUM ON EVOLVING FUZZY SYSTEMS, PROCEEDINGS, 2006, : 318 - +
[6] Interoperable Multimedia Metadata through Similarity-Based Semantic Web Service Discovery
Dietze, Stefan
Benn, Neil
Domingue, John
Conconi, Alex
Cattaneo, Fabio
[J]. SEMANTIC MULTIMEDIA, PROCEEDINGS, 2009, 5887 : 77 - +
[7] Analysis of an MFCC-based audio indexing system for efficient coding of multimedia sources
Mubarak, OM
Ambikairajah, E
Epps, J
[J]. ISSPA 2005: THE 8TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2005, : 619 - 622
[8] Speech/music discrimination-based audio characterization using blind watermarking scheme
Mezghani, Eya
Charfeddine, Maha
Nicolas, Henri
Ben Amar, Chokri
[J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2016, 11 (06): : 311 - 321
[9] SPEECH/MUSIC DISCRIMINATION BASED ON WARPING TRANSFORMATION AND FUZZY LOGIC FOR INTELLIGENT AUDIO CODING
Enrique Munoz-Exposito, Jose
Garcia Galan, Sebastian
Ruiz Reyes, Nicolas
Vera Candeas, Pedro
[J]. APPLIED ARTIFICIAL INTELLIGENCE, 2009, 23 (05) : 427 - 442
[10] An RNN-Based Speech-Music Discrimination Used for Hybrid Audio Coder
Yang, Wanzhao
Tu, Weiping
Zheng, Jiaxi
Zhang, Xiong
Yang, Yuhong
Song, Yucheng
[J]. MULTIMEDIA MODELING, MMM 2018, PT I, 2018, 10704 : 81 - 92

← 1 2 →