Robust Environmental Sound Recognition With Sparse Key-Point Encoding and Efficient Multispike Learning

被引:14
|
作者
Yu, Qiang [1 ]
Yao, Yanli [1 ]
Wang, Longbiao [1 ]
Tang, Huajin [2 ]
Dang, Jianwu [1 ]
Tan, Kay Chen [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin 300350, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 610065, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Encoding; Task analysis; Hidden Markov models; Neurons; Biological neural networks; Mel frequency cepstral coefficient; Biological information theory; Brain-like processing; feature extraction; multispike learning; neuromorphic computing; robust sound recognition; spike encoding; spiking neural networks (SNNs); AUTOMATIC SPEECH RECOGNITION; EVENT CLASSIFICATION; FEATURES; NETWORKS; NEURON;
D O I
10.1109/TNNLS.2020.2978764
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The capability for environmental sound recognition (ESR) can determine the fitness of individuals in a way to avoid dangers or pursue opportunities when critical sound events occur. It still remains mysterious about the fundamental principles of biological systems that result in such a remarkable ability. Additionally, the practical importance of ESR has attracted an increasing amount of research attention, but the chaotic and nonstationary difficulties continue to make it a challenging task. In this article, we propose a spike-based framework from a more brain-like perspective for the ESR task. Our framework is a unifying system with consistent integration of three major functional parts which are sparse encoding, efficient learning, and robust readout. We first introduce a simple sparse encoding, where key points are used for feature representation, and demonstrate its generalization to both spike- and nonspike-based systems. Then, we evaluate the learning properties of different learning rules in detail with our contributions being added for improvements. Our results highlight the advantages of multispike learning, providing a selection reference for various spike-based developments. Finally, we combine the multispike readout with the other parts to form a system for ESR. Experimental results show that our framework performs the best as compared to other baseline approaches. In addition, we show that our spike-based framework has several advantageous characteristics including early decision making, small dataset acquiring, and ongoing dynamic processing. Our framework is the first attempt to apply the multispike characteristic of nervous neurons to ESR. The outstanding performance of our approach would potentially contribute to draw more research efforts to push the boundaries of spike-based paradigm to a new horizon.
引用
收藏
页码:625 / 638
页数:14
相关论文
共 50 条
  • [41] Sparse multi-stage regularized feature learning for robust face recognition
    Borgi, Mohamed Anouar
    Labate, Demetrio
    El Arbi, Maher
    Ben Amar, Chokri
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (01) : 269 - 279
  • [42] Environmental sound recognition on embedded devices using deep learning: a review
    Gairi, Pau
    Palleja, Tomas
    Tresanchez, Marcel
    ARTIFICIAL INTELLIGENCE REVIEW, 2025, 58 (06)
  • [43] Machine Learning Algorithms for Environmental Sound Recognition: Towards Soundscape Semantics
    Bountourakis, Vasileios
    Vrysis, Lazaros
    Papanikolaou, George
    PROCEEDINGS OF THE 10TH AUDIO MOSTLY: A CONFERENCE ON INTERACTION WITH SOUND, AM'15, 2015,
  • [44] Human action recognition using key point detection and machine learning
    Archana, M.
    Kavitha, S.
    Vathsala, A. Vani
    2024 4TH INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND SOCIAL NETWORKING, ICPCSN 2024, 2024, : 410 - 413
  • [45] Robust iris recognition using sparse error correction model and discriminative dictionary learning
    Song, Yun
    Cao, Wei
    He, Zunliang
    NEUROCOMPUTING, 2014, 137 : 198 - 204
  • [46] A Bayesian prediction approach to robust speech recognition and online environmental learning
    Chien, JT
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 321 - 334
  • [47] Block Sparse Bayesian Learning over Local Dictionary for Robust SAR Target Recognition
    Li, Chenyu
    Liu, Guohua
    INTERNATIONAL JOURNAL OF OPTICS, 2020, 2020
  • [48] CasViGE: Learning robust point cloud registration with cascaded visual-geometric encoding
    Qin, Zheng
    Wang, Changjian
    Peng, Yuxing
    Xu, Kai
    COMPUTER AIDED GEOMETRIC DESIGN, 2023, 104
  • [49] Local feature approach to dorsal hand vein recognition by Centroid-based Circular Key-point Grid and fine-grained matching
    Huang, Di
    Zhang, Renke
    Yin, Yuan
    Wang, Yiding
    Wang, Yunhong
    IMAGE AND VISION COMPUTING, 2017, 58 : 266 - 277
  • [50] Optically Non-Contact Cross-Country Skiing Action Recognition Based on Key-Point Collaborative Estimation and Motion Feature Extraction
    Qi, Jiashuo
    Li, Dongguang
    He, Jian
    Wang, Yu
    SENSORS, 2023, 23 (07)