Recognition and location of marine animal sounds using two-stream ConvNet with attention

被引:3
|
作者
Hu, Shaoxiang [1 ]
Hou, Rong [2 ]
Liao, Zhiwu [3 ]
Chen, Peng [2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu, Peoples R China
[2] Chengdu Res Base Giant Panda Breeding, Sichuan Key Lab Conservat Biol Endangered Wildlife, Chengdu, Peoples R China
[3] Sichuan Normal Univ, Acad Global Governance & Area Studies, Chengdu, Peoples R China
关键词
voice recognition; location; two-stream ConvNet; YOLO; attention; CMFCC; SOURCE LOCALIZATION;
D O I
10.3389/fmars.2023.1059622
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
There are abundant resources and many endangered marine animals in the ocean. Using sound to effectively identify and locate them, and estimate their distribution area, has a very important role in the study of the complex diversity of marine animals (Hanny et al., 2013). We design a Two-Stream ConvNet with Attention (TSCA) model, which is a two-stream model combined with attention, in which one branch processes the temporal signal and the other branch processes the frequency domain signal; It makes good use of the characteristics of high time resolution of time domain signal and high recognition rate of frequency domain signal features of sound, and it realizes rapid localization and recognition of sound of marine species. The basic network architecture of the model is YOLO (You Only Look Once) (Joseph et al., 2016). A new loss function focal loss is constructed to strengthen the impact on the tail class of the sample, overcome the problem of data imbalance and avoid over fitting. At the same time, the attention module is constructed to focus on more detailed sound features, so as to improve the noise resistance of the model and achieve high-precision marine species identification and location. In The Watkins Marine Mammal Sound Database, the recognition rate of the algorithm reached 92.04% and the positioning accuracy reached 78.4%.The experimental results show that the algorithm has good robustness, high recognition accuracy and positioning accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] TWO-STREAM HYBRID ATTENTION NETWORK FOR MULTIMODAL CLASSIFICATION
    Chen, Qipin
    Shi, Zhenyu
    Zuo, Zhen
    Fu, Jinmiao
    Sun, Yi
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 359 - 363
  • [22] Traffic Risk Assessment: A Two-Stream Approach Using Dynamic-Attention
    Gary-Patrick, Corcoran
    James, Clark
    2019 16TH CONFERENCE ON COMPUTER AND ROBOT VISION (CRV 2019), 2019, : 166 - 173
  • [23] Enhanced Spatial Stream of Two-Stream Network Using Optical Flow for Human Action Recognition
    Khan, Shahbaz
    Hassan, Ali
    Hussain, Farhan
    Perwaiz, Aqib
    Riaz, Farhan
    Alsabaan, Maazen
    Abdul, Wadood
    APPLIED SCIENCES-BASEL, 2023, 13 (14):
  • [24] Dynamic Gesture Recognition Combining Two-stream 3D Convolution with Attention Mechanisms
    Wang Fenhua
    Zhang Qiang
    Huang Chao
    Zhang Ran
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (05) : 1389 - 1396
  • [25] Recognition of S1 and S2 heart sounds with two-stream convolutional neural networks
    Shen Y.
    Wang X.
    Tang M.
    Liang J.
    Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2021, 38 (01): : 138 - 144
  • [26] Two-Stream Emotion Recognition For Call Center Monitoring
    Gupta, Purnima
    Rajput, Nitendra
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1037 - +
  • [27] Two-stream Deep Representation for Human Action Recognition
    Ghrab, Najla Bouarada
    Fendri, Emna
    Hammami, Mohamed
    FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
  • [28] Improved two-stream model for human action recognition
    Zhao, Yuxuan
    Man, Ka Lok
    Smith, Jeremy
    Siddique, Kamran
    Guan, Sheng-Uei
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2020, 2020 (01)
  • [29] Hidden Two-Stream Convolutional Networks for Action Recognition
    Zhu, Yi
    Lan, Zhenzhong
    Newsam, Shawn
    Hauptmann, Alexander
    COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 363 - 378
  • [30] Two-Stream Network for Sign Language Recognition and Translation
    Chen, Yutong
    Zuo, Ronglai
    Wei, Fangyun
    Wu, Yu
    Liu, Shujie
    Mak, Brian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,