An Audio Data Representation for Traffic Acoustic Scene Recognition

被引:8
|
作者
Jiang, Dazhi [1 ,2 ]
Huang, Dongmin [1 ]
Song, Youyi [3 ]
Wu, Kaichao [1 ]
Lu, Huakang [1 ]
Liu, Quanquan [1 ]
Zhou, Teng [1 ,2 ,3 ]
机构
[1] Shantou Univ, Coll Engn, Dept Comp Sci, Shantou 515063, Peoples R China
[2] Shantou Univ, Key Lab Intelligent Mfg Technol, Minist Educ, Shantou 515063, Peoples R China
[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Sch Nursing, Hong Kong, Peoples R China
关键词
Acoustics; Feature extraction; Spectrogram; Transforms; Histograms; Time-frequency analysis; Visualization; acoustic scene recognition; transportation; acoustic material; HEALTH;
D O I
10.1109/ACCESS.2020.3027474
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Acoustic scene recognition (ASR), recognizing acoustic environments given an audio recording of the scene, has a wide range of applications, e.g. robotic navigation and audio forensic. However, ASR remains challenging mainly due to the difficulty of representing audio data. In this article, we focus on traffic acoustic data. Traffic acoustic sense recognition provides complementary information to visual information of the scene; for example, it can be used to verify the visual perception result. The acoustic analysis and recognition, in consideration of its simple and convenient, can effectively enhance the perception ability which only applies visual information. We propose an audio data representation method to improve the traffic acoustic scene recognition accuracy. The proposed method employs the constant Q transform (CQT) and histogram of gradient (HOG) to transfer the one-dimensional audio signals into a time-frequency representation. We also propose two data representation mechanisms, called global and local feature selections, in order to select features that are able to describe the shape of time-frequency structures. We finally exploit the least absolute shrinkage and selection operator (LASSO) technique to further improve the recognition accuracy, by further selecting the most representative information for the recognition. We implemented extensive experiments, and the results show that the proposed method is effective, significantly outperforming the state-of-the-art methods.
引用
收藏
页码:177863 / 177873
页数:11
相关论文
共 50 条
  • [31] Imperceptible adversarial attacks against traffic scene recognition
    Zhu, Yinghui
    Jiang, Yuzhen
    SOFT COMPUTING, 2021, 25 (20) : 13069 - 13077
  • [32] Scene-Aware Audio Rendering via Deep Acoustic Analysis
    Tang, Zhenyu
    Bryan, Nicholas J.
    Li, Dingzeyu
    Langlois, Timothy R.
    Manocha, Dinesh
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (05) : 1991 - 2001
  • [33] Chinese Traffic Police Gesture Recognition in Complex Scene
    Guo, Fan
    Cai, Zixing
    Tang, Jin
    TRUSTCOM 2011: 2011 INTERNATIONAL JOINT CONFERENCE OF IEEE TRUSTCOM-11/IEEE ICESS-11/FCST-11, 2011, : 1505 - 1511
  • [34] Large scale data based audio scene classification
    Sophiya E.
    Jothilakshmi S.
    International Journal of Speech Technology, 2018, 21 (04) : 825 - 836
  • [35] Vector Symbolic Scene Representation for Semantic Place Recognition
    Kirilenko, Daniil
    Kovalev, Alexey K.
    Solomentsev, Yaroslav
    Melekhin, Alexander
    Yudin, Dmitry A.
    Panov, Aleksandr, I
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [36] A Discriminative Representation of Convolutional Features for Indoor Scene Recognition
    Khan, Salman H.
    Hayat, Munawar
    Bennamoun, Mohammed
    Togneri, Roberto
    Sohel, Ferdous A.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (07) : 3372 - 3383
  • [37] Acoustic Scene Recognition Based on Convolutional Neural Networks
    Sun, Fengjiao
    Wang, Mingjiang
    Xu, Qihang
    Xuan, Xiaogung
    Zhang, Xin
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 122 - 126
  • [38] Data Augmentation for Scene Text Recognition
    Atienza, Rowel
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1561 - 1570
  • [39] Hierarchical Sparse Representation for Traffic Sign Recognition
    Fan, Yaxiang
    Sun, Hao
    Zhou, Shilin
    Zou, Huanxin
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 653 - 660
  • [40] Acoustic Scene Classification using Binaural Representation and Classifier Combination
    Arabnezhad, Fatemeh
    Nasersharif, Babak
    2019 9TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE 2019), 2019, : 351 - 355