An Audio Data Representation for Traffic Acoustic Scene Recognition

被引:8
|
作者
Jiang, Dazhi [1 ,2 ]
Huang, Dongmin [1 ]
Song, Youyi [3 ]
Wu, Kaichao [1 ]
Lu, Huakang [1 ]
Liu, Quanquan [1 ]
Zhou, Teng [1 ,2 ,3 ]
机构
[1] Shantou Univ, Coll Engn, Dept Comp Sci, Shantou 515063, Peoples R China
[2] Shantou Univ, Key Lab Intelligent Mfg Technol, Minist Educ, Shantou 515063, Peoples R China
[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Sch Nursing, Hong Kong, Peoples R China
关键词
Acoustics; Feature extraction; Spectrogram; Transforms; Histograms; Time-frequency analysis; Visualization; acoustic scene recognition; transportation; acoustic material; HEALTH;
D O I
10.1109/ACCESS.2020.3027474
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Acoustic scene recognition (ASR), recognizing acoustic environments given an audio recording of the scene, has a wide range of applications, e.g. robotic navigation and audio forensic. However, ASR remains challenging mainly due to the difficulty of representing audio data. In this article, we focus on traffic acoustic data. Traffic acoustic sense recognition provides complementary information to visual information of the scene; for example, it can be used to verify the visual perception result. The acoustic analysis and recognition, in consideration of its simple and convenient, can effectively enhance the perception ability which only applies visual information. We propose an audio data representation method to improve the traffic acoustic scene recognition accuracy. The proposed method employs the constant Q transform (CQT) and histogram of gradient (HOG) to transfer the one-dimensional audio signals into a time-frequency representation. We also propose two data representation mechanisms, called global and local feature selections, in order to select features that are able to describe the shape of time-frequency structures. We finally exploit the least absolute shrinkage and selection operator (LASSO) technique to further improve the recognition accuracy, by further selecting the most representative information for the recognition. We implemented extensive experiments, and the results show that the proposed method is effective, significantly outperforming the state-of-the-art methods.
引用
收藏
页码:177863 / 177873
页数:11
相关论文
共 50 条
  • [21] Scene recognition using multiple representation network
    Lin, Chaowei
    Lee, Feifei
    Xie, Lin
    Cai, Jiawei
    Chen, Hanqing
    Liu, Li
    Chen, Qiu
    APPLIED SOFT COMPUTING, 2022, 118
  • [22] Arrangement based image representation for scene recognition
    Somanath, Gowri
    Kambhamettu, Chandra
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2436 - 2439
  • [23] SEMANTIC SEGMENTATION AS IMAGE REPRESENTATION FOR SCENE RECOGNITION
    Bassiouny, Ahmed
    El-Saban, Motaz
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 981 - 985
  • [24] Primitive Representation Learning for Scene Text Recognition
    Yan, Ruijie
    Peng, Liangrui
    Xiao, Shanyu
    Yao, Gang
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 284 - 293
  • [25] Weakly Supervised Representation Learning for Audio-Visual Scene Analysis
    Parekh, Sanjeel
    Essid, Slim
    Ozerov, Alexey
    Ngoc Q K Duong
    Perez, Patrick
    Richard, Gael
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 416 - 428
  • [26] Pedestrian traffic lights recognition in a scene using a PDA
    Eddowes, DM
    Krahe, JL
    Proceedings of the Fourth IASTED International Conference on Visualization, Imaging, and Image Processing, 2004, : 578 - 583
  • [27] Deep multiple classifier fusion for traffic scene recognition
    Fangyu Wu
    Shiyang Yan
    Jeremy S. Smith
    Bailing Zhang
    Granular Computing, 2021, 6 : 217 - 228
  • [28] Acoustic Scene Classification Using Deep Audio Feature and BLSTM Network
    Li, Yanxiong
    Li, Xianku
    Zhang, Yuhan
    Wang, Wucheng
    Liu, Mingle
    Feng, Xiaohui
    2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 371 - 374
  • [29] Deep multiple classifier fusion for traffic scene recognition
    Wu, Fangyu
    Yan, Shiyang
    Smith, Jeremy S.
    Zhang, Bailing
    GRANULAR COMPUTING, 2021, 6 (01) : 217 - 228
  • [30] Traffic Light Recognition for Complex Scene With Fusion Detections
    Li, Xi
    Ma, Huimin
    Wang, Xiang
    Zhang, Xiaoqin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (01) : 199 - 208