Enhancing Speech and Music Discrimination Through the Integration of Static and Dynamic Features

被引:0
|
作者
Chen, Liangwei [1 ]
Zhou, Xiren [1 ]
Tut, Qiang [2 ]
Chen, Huanhuan [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Anhui Prov Hosp, Hefei, Peoples R China
来源
基金
国家重点研发计划;
关键词
reservoir computing model; stacked autoencoder; speech-music classification; audio processing; CLASSIFICATION;
D O I
10.21437/Interspeech.2024-1596
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio is inherently temporal data, where features extracted from each segment evolve over time, yielding dynamic traits. These dynamics, relative to the acoustic characteristics inherent in raw audio features, primarily serve as complementary aids for audio classification. This paper employs the reservoir computing model to fit the audio feature sequences efficiently, capturing feature-sequence dynamics into the readout models, and without the need for offline iterative training. Additionally, stacked autoencoders further integrate the extracted static features (i.e., raw audio features) with the captured dynamics, resulting in more stable and effective classification performance. The entire framework is called Static-Dynamic Integration Network (SDIN). The conducted experiments demonstrate the effectiveness of SDIN in speech-music classification tasks.
引用
收藏
页码:4318 / 4322
页数:5
相关论文
共 50 条
  • [41] SPEECH-MUSIC DISCRIMINATION: A DEEP LEARNING PERSPECTIVE
    Pikrakis, Aggelos
    Theodoridis, Sergios
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 616 - 620
  • [42] Real-time discrimination of broadcast speech/music
    Saunders, J
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 993 - 996
  • [43] AWE: Improving Software Analysis through Modular Integration of Static and Dynamic Analyses
    Brown, Ruben E.
    Khazan, Roger
    Zhivich, Michael
    PASTE'07 PROCEEDINGS OF THE 2007 ACM SIGPLAN- SIGSOFT WORKSHOP ON PROGRAM ANALYSIS FOR SOFTWARE TOOLS & ENGINEERING, 2007, : 69 - 74
  • [44] DISCRIMINATION FUNCTIONS PREDICTED FROM CATEGORIES IN SPEECH AND MUSIC
    CUTTING, JE
    ROSNER, BS
    PERCEPTION & PSYCHOPHYSICS, 1976, 20 (01): : 87 - 88
  • [45] On the Discrimination of Speech/Music using a Time Series Regularity
    Swe, Ei Mon Mon
    Pwint, Moe
    Sattar, Farook
    ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 53 - +
  • [46] A wavelet-based parameterization for speech/music discrimination
    Didiot, E.
    Illina, I.
    Fohr, D.
    Mella, O.
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (02): : 341 - 357
  • [47] Speech/Music Discrimination Based on Discrete Wavelet Transform
    Ntalampiras, Stavros
    Fakotakis, Nikos
    ARTIFICIAL INTELLIGENCE: THEORIES, MODELS AND APPLICATIONS, SETN 2008, 2008, 5138 : 205 - 211
  • [48] Enhancing the magnitude spectrum of speech features for robust speech recognition
    Hung, Jeih-weih
    Fan, Hao-teng
    Tu, Wen-hsiang
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2012,
  • [49] Enhancing the magnitude spectrum of speech features for robust speech recognition
    Jeih-weih Hung
    Hao-teng Fan
    Wen-hsiang Tu
    EURASIP Journal on Advances in Signal Processing, 2012
  • [50] Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features
    Deng, L
    Droppo, J
    Acero, A
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (03): : 218 - 233