Improvement to speech-music discrimination using sinusoidal model based features

被引：0

作者：

Jalil Shirazi

Shahrokh Ghaemmaghami

机构：

[1] Islamic Azad University,Science & Research Branch

[2] Sharif University of Technology,undefined

来源：

Multimedia Tools and Applications | 2010年 / 50卷

关键词：

Audio classification; Sinusoidal model;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper addresses a model-based audio content analysis for classification of speech-music mixed audio signals into speech and music. A set of new features is presented and evaluated based on sinusoidal modeling of audio signals. The new feature set, including variance of the birth frequencies and duration of the longest frequency track in sinusoidal model, as a measure of the harmony and signal continuity, is introduced and discussed in detail. These features are used and compared to typical features as inputs to an audio classifier. Performance of these sinusoidal model features is evaluated through classification of audio into speech and music using both the GMM (Gaussian Mixture Model) and the SVM (Support Vector Machine) classifiers. Experimental results show that the proposed features are quite successful in speech/music discrimination. By using only a set of two sinusoidal model features, extracted from 1-s segments of the signal, we achieved 96.84% accuracy in the audio classification. Experimental comparisons also confirm superiority of the sinusoidal model features to the popular time domain and frequency domain features in audio classification.

引用

页码：415 / 435

页数：20

共 50 条

[41] Co-channel speech separation based on sinusoidal model for speech
Zhao, HM
Zhou, XD
Yu, YB
2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 815 - 818
[42] Speech vs Music Discrimination using Empirical Mode Decomposition
Khonglah, Banriskhem K.
Sharma, Rajib
Prasanna, S. R. Mahadeva
2015 TWENTY FIRST NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2015,
[43] ANALYSIS OF EMOTIONAL SPEECH USING AN ADAPTIVE SINUSOIDAL MODEL
Kafentzis, George P.
Yakoumaki, Theodora
Mouchtaris, Athanasios
Styhanou, Yannis
2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 1492 - 1496
[44] Discrimination between Speech and Music Using Time Series Events
Alnadabi, Muhammad
Johnstone, Sherri
ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 565 - +
[45] Speech/Music Discrimination Using Spectrum Analysis and Neural Network
Keum, Ji-Soo
Lim, Sung -Kit
Lee, Hyon-Soo
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2007, 26 (05): : 207 - 213
[46] Enhancement of spectral contrast to speech using a sinusoidal model
Aguilera, CM
Navas, A
Tejero, JC
Gago, A
ELECTRONICS LETTERS, 1999, 35 (23) : 1997 - 1998
[47] SPEAKER DEPENDENT SPEECH ENHANCEMENT USING SINUSOIDAL MODEL
Mowlaee, Pejman
Nachbar, Christian
2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 80 - 84
[48] Voiced speech excitation synthesis using a sinusoidal model
Pollard, MP
Cheetham, BMG
Goodyear, CC
Edgington, MD
ELECTRONICS LETTERS, 1998, 34 (06) : 531 - 532
[49] Speech enhancement using a constrained iterative sinusoidal model
Jensen, J
Hansen, JHL
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (07): : 731 - 740
[50] Speech/music discrimination based on a modified low energy ratio
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Qinghua Daxue Xuebao, 2008, SUPPL. (720-724):

← 1 2 3 4 5 →