Robust Environmental Sound Recognition With Sparse Key-Point Encoding and Efficient Multispike Learning

被引:14
|
作者
Yu, Qiang [1 ]
Yao, Yanli [1 ]
Wang, Longbiao [1 ]
Tang, Huajin [2 ]
Dang, Jianwu [1 ]
Tan, Kay Chen [3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin 300350, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 610065, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Encoding; Task analysis; Hidden Markov models; Neurons; Biological neural networks; Mel frequency cepstral coefficient; Biological information theory; Brain-like processing; feature extraction; multispike learning; neuromorphic computing; robust sound recognition; spike encoding; spiking neural networks (SNNs); AUTOMATIC SPEECH RECOGNITION; EVENT CLASSIFICATION; FEATURES; NETWORKS; NEURON;
D O I
10.1109/TNNLS.2020.2978764
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The capability for environmental sound recognition (ESR) can determine the fitness of individuals in a way to avoid dangers or pursue opportunities when critical sound events occur. It still remains mysterious about the fundamental principles of biological systems that result in such a remarkable ability. Additionally, the practical importance of ESR has attracted an increasing amount of research attention, but the chaotic and nonstationary difficulties continue to make it a challenging task. In this article, we propose a spike-based framework from a more brain-like perspective for the ESR task. Our framework is a unifying system with consistent integration of three major functional parts which are sparse encoding, efficient learning, and robust readout. We first introduce a simple sparse encoding, where key points are used for feature representation, and demonstrate its generalization to both spike- and nonspike-based systems. Then, we evaluate the learning properties of different learning rules in detail with our contributions being added for improvements. Our results highlight the advantages of multispike learning, providing a selection reference for various spike-based developments. Finally, we combine the multispike readout with the other parts to form a system for ESR. Experimental results show that our framework performs the best as compared to other baseline approaches. In addition, we show that our spike-based framework has several advantageous characteristics including early decision making, small dataset acquiring, and ongoing dynamic processing. Our framework is the first attempt to apply the multispike characteristic of nervous neurons to ESR. The outstanding performance of our approach would potentially contribute to draw more research efforts to push the boundaries of spike-based paradigm to a new horizon.
引用
收藏
页码:625 / 638
页数:14
相关论文
共 50 条
  • [31] A sparse object category model for efficient learning and exhaustive recognition
    Fergus, R
    Perona, P
    Zisserman, A
    2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, : 380 - 387
  • [32] A sparse object category model for efficient learning and complete recognition
    Fergus, Rob
    Perona, Pietro
    Zisserman, Andrew
    TOWARD CATEGORY-LEVEL OBJECT RECOGNITION, 2006, 4170 : 443 - +
  • [33] Robust Representation and Recognition of Facial Emotions Using Extreme Sparse Learning
    Shojaeilangari, Seyedehsamaneh
    Yau, Wei-Yun
    Nandakumar, Karthik
    Li, Jun
    Teoh, Eam Khwang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (07) : 2140 - 2152
  • [34] Sparse Simultaneous Recurrent Deep Learning for Robust Facial Expression Recognition
    Alam, Mahbubul
    Vidyaratne, Lasitha S.
    Iftekharuddin, Khan M.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) : 4905 - 4916
  • [35] Key-Point Detection Algorithm of Deep Learning Can Predict Lower Limb Alignment with Simple Knee Radiographs
    Nam, Hee Seung
    Park, Sang Hyun
    Ho, Jade Pei Yuik
    Park, Seong Yun
    Cho, Joon Hee
    Lee, Yong Seuk
    JOURNAL OF CLINICAL MEDICINE, 2023, 12 (04)
  • [36] Sparse Depth Calculation Using Real-Time Key-Point Detection and Structure from Motion for Advanced Driver Assist Systems
    Prakash, Charan D.
    Li, Jinjin
    Akhbari, Farshad
    Karam, Lina J.
    ADVANCES IN VISUAL COMPUTING (ISVC 2014), PT 1, 2014, 8887 : 740 - 751
  • [37] Robust speech recognition based on discriminative learning of environmental features
    Han, J.Q.
    Gao, W.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2001, 29 (02): : 196 - 198
  • [38] Efficient algorithm for sparse coding and dictionary learning with applications to face recognition
    Zhao, Zhong
    Feng, Guocan
    JOURNAL OF ELECTRONIC IMAGING, 2015, 24 (02)
  • [39] Efficient Learning of Sparse, Distributed, Convolutional Feature Representations for Object Recognition
    Sohn, Kihyuk
    Jung, Dae Yon
    Lee, Honglak
    Hero, Alfred O., III
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 2643 - 2650
  • [40] A Novel Learning Dictionary for Sparse Coding-Based Key Point Detection
    Hong, Phuoc-Thanh
    Guan, Ling
    IEEE MULTIMEDIA, 2023, 30 (04) : 47 - 60