Enhancing Speech and Music Discrimination Through the Integration of Static and Dynamic Features

被引:0
|
作者
Chen, Liangwei [1 ]
Zhou, Xiren [1 ]
Tut, Qiang [2 ]
Chen, Huanhuan [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Anhui Prov Hosp, Hefei, Peoples R China
来源
基金
国家重点研发计划;
关键词
reservoir computing model; stacked autoencoder; speech-music classification; audio processing; CLASSIFICATION;
D O I
10.21437/Interspeech.2024-1596
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio is inherently temporal data, where features extracted from each segment evolve over time, yielding dynamic traits. These dynamics, relative to the acoustic characteristics inherent in raw audio features, primarily serve as complementary aids for audio classification. This paper employs the reservoir computing model to fit the audio feature sequences efficiently, capturing feature-sequence dynamics into the readout models, and without the need for offline iterative training. Additionally, stacked autoencoders further integrate the extracted static features (i.e., raw audio features) with the captured dynamics, resulting in more stable and effective classification performance. The entire framework is called Static-Dynamic Integration Network (SDIN). The conducted experiments demonstrate the effectiveness of SDIN in speech-music classification tasks.
引用
收藏
页码:4318 / 4322
页数:5
相关论文
共 50 条
  • [21] Static, Dynamic and Acceleration Features for CNN-Based Speech Emotion Recognition
    Khalifa, Intissar
    Ejbali, Ridha
    Napoletano, Paolo
    Schettini, Raimondo
    Zaied, Mourad
    AIXIA 2021 - ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13196 : 348 - 358
  • [22] HIERARCHICAL NETWORK BASED ON THE FUSION OF STATIC AND DYNAMIC FEATURES FOR SPEECH EMOTION RECOGNITION
    Cao, Qi
    Hou, Mixiao
    Chen, Bingzhi
    Zhang, Zheng
    Lu, Guangming
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6334 - 6338
  • [23] Vector quantization of speech spectral parameters using statistics of static and dynamic features
    Koishida, K
    Tokuda, K
    Masuko, T
    Kobayashi, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2001, E84D (10) : 1427 - 1434
  • [24] Static vs. Dynamic Modelling of Acoustic Speech Features for Detection of Dementia
    Syed, Muhammad Shehram Shah
    Syed, Zafi Sherhan
    Pirogova, Elena
    Lech, Margaret
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 662 - 667
  • [25] Robust integration for speech features
    Huang, Kuo-Chang
    Juang, Yau-Tarng
    Chang, Wen-Chieh
    SIGNAL PROCESSING, 2006, 86 (09) : 2282 - 2288
  • [26] BINAURAL INTEGRATION OF SPEECH FEATURES
    JOVICIC, ST
    ACUSTICA, 1991, 73 (05): : 283 - 286
  • [27] Empirical mode decomposition based statistical features for discrimination of speech and low frequency music signal
    Kumar, Arvind
    Chandra, Mahesh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (01) : 33 - 58
  • [28] Empirical mode decomposition based statistical features for discrimination of speech and low frequency music signal
    Arvind Kumar
    Mahesh Chandra
    Multimedia Tools and Applications, 2023, 82 : 33 - 58
  • [29] Discrimination Effectiveness of Speech Cepstral Features
    Malegaonkar, A.
    Ariyaeeinia, A.
    Sivakumaran, P.
    Pillay, S.
    BIOMETRICS AND IDENTITY MANAGEMENT, 2008, 5372 : 91 - 99
  • [30] A fast and robust speech/music discrimination approach
    Wang, WQ
    Gao, W
    Ying, DW
    ICICS-PCM 2003, VOLS 1-3, PROCEEDINGS, 2003, : 1325 - 1329