Enhancing Speech and Music Discrimination Through the Integration of Static and Dynamic Features

被引:0
|
作者
Chen, Liangwei [1 ]
Zhou, Xiren [1 ]
Tut, Qiang [2 ]
Chen, Huanhuan [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Anhui Prov Hosp, Hefei, Peoples R China
来源
基金
国家重点研发计划;
关键词
reservoir computing model; stacked autoencoder; speech-music classification; audio processing; CLASSIFICATION;
D O I
10.21437/Interspeech.2024-1596
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio is inherently temporal data, where features extracted from each segment evolve over time, yielding dynamic traits. These dynamics, relative to the acoustic characteristics inherent in raw audio features, primarily serve as complementary aids for audio classification. This paper employs the reservoir computing model to fit the audio feature sequences efficiently, capturing feature-sequence dynamics into the readout models, and without the need for offline iterative training. Additionally, stacked autoencoders further integrate the extracted static features (i.e., raw audio features) with the captured dynamics, resulting in more stable and effective classification performance. The entire framework is called Static-Dynamic Integration Network (SDIN). The conducted experiments demonstrate the effectiveness of SDIN in speech-music classification tasks.
引用
收藏
页码:4318 / 4322
页数:5
相关论文
共 50 条
  • [1] MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION
    Sell, Gregory
    Clark, Pascal
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] A comparison of features for speech, music discrimination.
    Carey, MJ
    Parris, ES
    Lloyd-Thomas, H
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 149 - 152
  • [3] Novel features for effective speech and music discrimination
    Muharak, Omer Mohsin
    Ambikairajah, Eliathamby
    Epps, Julien
    2006 IEEE INTERNATIONAL CONFERENCE ON ENGINEERING OF INTELLIGENT SYSTEMS, 2006, : 343 - +
  • [4] Enhancing Speech Discrimination Through Stimulus Repetition
    Holt, Rachael Frush
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2011, 54 (05): : 1431 - 1447
  • [5] Speech music discrimination using class-specific features
    Beierholm, T
    Baggenstoss, PM
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 379 - 382
  • [6] A dynamic programming approach to audio segmentation and speech/music discrimination
    Goodwin, MM
    Laroche, J
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS: AUDIO AND ELECTROACOUSTICS SIGNAL PROCESSING FOR COMMUNICATIONS, 2004, : 309 - 312
  • [7] Integration of Speech/Music Discrimination and Mood Classification with Audio Feature Extraction
    Ashraf, Mohsin
    Geng Guohua
    Wang, Xiaofeng
    Ahmad, Farooq
    2018 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT 2018), 2018, : 224 - 229
  • [8] Histogram Equalization-Based Features for Speech, Music, and Song Discrimination
    Gallardo-Antolin, Ascension
    Montero, Juan M.
    IEEE SIGNAL PROCESSING LETTERS, 2010, 17 (07) : 659 - 662
  • [9] NOISE ROBUST FEATURES FOR SPEECH/MUSIC DISCRIMINATION IN REAL-TIME TELECOMMUNICATION
    Fu, Zhong-Hua
    Wang, Jhing-Fa
    Xie, Lei
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 574 - +
  • [10] Improvement to speech-music discrimination using sinusoidal model based features
    Shirazi, Jalil
    Ghaemmaghami, Shahrokh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2010, 50 (02) : 415 - 435