Explainable audio CNNs applied to neural decoding: sound category identification from inferior colliculus

被引：3

作者：

Ozcan, Fatma ^{[1
]}

Alkan, Ahmet ^{[1
]}

机构：

[1] Kahramanmaras Sutcu Imam Univ, Elect & Elect Engn Dept, TR-46100 Kahramanmaras, Turkiye

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2024年 / 18卷 / 02期

关键词：

Explanability; Interpretation; Pre-trained audio networks; Temporal correlation; Time resolution; Multiunit activity;

D O I：

10.1007/s11760-023-02825-3

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recently, work has been done to understand aspects of how CI processes with sound. Here, we use neural temporal correlation in the inferior colliculus for identifying and categorising the sound that was used as a stimulus. The success of the classification gradually deteriorates for shorter durations. We tried to improve these success values with deep learning methods for audio, on processing windows of 62.5 ms, 250 ms and 1000 ms. We demonstrate that 62.5 ms could be an integration time for temporal correlation. The neural data contains sound features that can be easily processed with artificial neural networks dedicated to audio signals. Network architectures dedicated to audio classification, such as Yamnet, Vggish, Openl3, used in transfer learning, give quite quickly neural data classification results with very high accuracy, compared to image classification networks. In the case of unshuffled correlation images, we have the best accuracy. With noiseless shuffled correlation images, we have the best accuracy, such as for 1000 ms: 100%, for 250 ms: 96.7%, for 62.5 ms: 93.8%, obtained with the OpenL3 network. To evaluate the importance of the contributions of the input features of a neural network to its outputs, we use Explainable Artificial Intelligence. We then used three different explicability methods, such as Grad-CAM, LIME and Occlusion Sensitivity to obtain three sensitive maps. Network uses different regions corresponding to a very high or very low correlation to make its prediction.

引用

页码：1193 / 1204

页数：12

共 37 条

[21] The time course of sound category identification: Insights from acoustic features
Ogg, Mattson
Slevc, L. Robert
Idsardi, William J.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (06): : 3459 - 3473
[22] SoundCount: Sound Counting from Raw Audio with Dyadic Decomposition Neural Network
He, Yuhang
Dai, Zhuangzhuang
Trigoni, Niki
Chen, Long
Markham, Andrew
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, : 12421 - 12429
[23] Impact Sound-Based Surface Identification Using Smart Audio Sensors With Deep Neural Networks
Ryu, Semin
Kim, Seung-Chan
IEEE SENSORS JOURNAL, 2020, 20 (18) : 10936 - 10944
[24] Neural correlates and mechanisms of spatial release from masking: Single-unit and population responses in the inferior colliculus
Lane, CC
Delgutte, B
JOURNAL OF NEUROPHYSIOLOGY, 2005, 94 (02) : 1180 - 1198
[25] Decoding of the Sound Frequency from the Steady-state Neural Activities in Rat Auditory Cortex
Shiramatsu, Tomoyo I.
Noda, Takahiro
Kanzaki, Ryohei
Takahashi, Hirokazu
2013 35TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2013, : 5598 - 5601
[26] KNOWLEDGE TRANSFER FROM WEAKLY LABELED AUDIO USING CONVOLUTIONAL NEURAL NETWORK FOR SOUND EVENTS AND SCENES
Kumar, Anurag
Khadkevich, Maksim
Fugen, Christian
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 326 - 330
[27] Topic Identification from Audio Recordings using Rich Recognition Results and Neural Network based Classifiers
Gemello, Roberto
Mana, Franco
Batzu, Pier Domenico
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2156 - 2159
[28] Application of Convolutional Neural Network for Decoding of 12-Lead Electrocardiogram from a Frequency-Modulated Audio Stream (Sonified ECG)
Krasteva, Vessela
Iliev, Ivo
Tabakov, Serafim
SENSORS, 2024, 24 (06)
[29] On Explainable Closed-Set Source Device Identification Using Log-Mel Spectrograms From Video' Audio: A Grad-CAM Approach
Korgialas, Christos
Tzolopoulos, Georgios
Kotropoulos, Constantine
IEEE ACCESS, 2024, 12 : 121822 - 121836
[30] A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells
Brown, EN
Frank, LM
Tang, DD
Quirk, MC
Wilson, MA
JOURNAL OF NEUROSCIENCE, 1998, 18 (18): : 7411 - 7425

← 1 2 3 4 →