Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus

被引:3
|
作者
Ozcan, Fatma [1 ]
Alkan, Ahmet [1 ]
机构
[1] Kahramanmaras Sutcu Imam Univ, Elect & Elect Engn Dept, TR-46100 Kahramanmaras, Turkiye
关键词
Explanability; Interpretation; Pre-trained audio networks; Temporal correlation; Time resolution; Multiunit activity;
D O I
10.1007/s11760-023-02825-3
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, work has been done to understand aspects of how CI processes with sound. Here, we use neural temporal correlation in the inferior colliculus for identifying and categorising the sound that was used as a stimulus. The success of the classification gradually deteriorates for shorter durations. We tried to improve these success values with deep learning methods for audio, on processing windows of 62.5 ms, 250 ms and 1000 ms. We demonstrate that 62.5 ms could be an integration time for temporal correlation. The neural data contains sound features that can be easily processed with artificial neural networks dedicated to audio signals. Network architectures dedicated to audio classification, such as Yamnet, Vggish, Openl3, used in transfer learning, give quite quickly neural data classification results with very high accuracy, compared to image classification networks. In the case of unshuffled correlation images, we have the best accuracy. With noiseless shuffled correlation images, we have the best accuracy, such as for 1000 ms: 100%, for 250 ms: 96.7%, for 62.5 ms: 93.8%, obtained with the OpenL3 network. To evaluate the importance of the contributions of the input features of a neural network to its outputs, we use Explainable Artificial Intelligence. We then used three different explicability methods, such as Grad-CAM, LIME and Occlusion Sensitivity to obtain three sensitive maps. Network uses different regions corresponding to a very high or very low correlation to make its prediction.
引用
收藏
页码:1193 / 1204
页数:12
相关论文
共 37 条
  • [21] The time course of sound category identification: Insights from acoustic features
    Ogg, Mattson
    Slevc, L. Robert
    Idsardi, William J.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (06): : 3459 - 3473
  • [22] SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network
    He, Yuhang
    Dai, Zhuangzhuang
    Trigoni, Niki
    Chen, Long
    Markham, Andrew
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, : 12421 - 12429
  • [23] Impact Sound-Based Surface Identification Using Smart Audio Sensors With Deep Neural Networks
    Ryu, Semin
    Kim, Seung-Chan
    IEEE SENSORS JOURNAL, 2020, 20 (18) : 10936 - 10944
  • [24] Neural correlates and mechanisms of spatial release from masking: Single-unit and population responses in the inferior colliculus
    Lane, CC
    Delgutte, B
    JOURNAL OF NEUROPHYSIOLOGY, 2005, 94 (02) : 1180 - 1198
  • [25] Decoding of the Sound Frequency from the Steady-state Neural Activities in Rat Auditory Cortex
    Shiramatsu, Tomoyo I.
    Noda, Takahiro
    Kanzaki, Ryohei
    Takahashi, Hirokazu
    2013 35TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2013, : 5598 - 5601
  • [26] KNOWLEDGE TRANSFER FROM WEAKLY LABELED AUDIO USING CONVOLUTIONAL NEURAL NETWORK FOR SOUND EVENTS AND SCENES
    Kumar, Anurag
    Khadkevich, Maksim
    Fugen, Christian
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 326 - 330
  • [27] Topic Identification from Audio Recordings using Rich Recognition Results and Neural Network based Classifiers
    Gemello, Roberto
    Mana, Franco
    Batzu, Pier Domenico
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2156 - 2159
  • [28] Application of Convolutional Neural Network for Decoding of 12-Lead Electrocardiogram from a Frequency-Modulated Audio Stream (Sonified ECG)
    Krasteva, Vessela
    Iliev, Ivo
    Tabakov, Serafim
    SENSORS, 2024, 24 (06)
  • [29] On Explainable Closed-Set Source Device Identification Using Log-Mel Spectrograms From Video' Audio: A Grad-CAM Approach
    Korgialas, Christos
    Tzolopoulos, Georgios
    Kotropoulos, Constantine
    IEEE ACCESS, 2024, 12 : 121822 - 121836
  • [30] A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells
    Brown, EN
    Frank, LM
    Tang, DD
    Quirk, MC
    Wilson, MA
    JOURNAL OF NEUROSCIENCE, 1998, 18 (18): : 7411 - 7425