An Audio Data Representation for Traffic Acoustic Scene Recognition

被引:8
|
作者
Jiang, Dazhi [1 ,2 ]
Huang, Dongmin [1 ]
Song, Youyi [3 ]
Wu, Kaichao [1 ]
Lu, Huakang [1 ]
Liu, Quanquan [1 ]
Zhou, Teng [1 ,2 ,3 ]
机构
[1] Shantou Univ, Coll Engn, Dept Comp Sci, Shantou 515063, Peoples R China
[2] Shantou Univ, Key Lab Intelligent Mfg Technol, Minist Educ, Shantou 515063, Peoples R China
[3] Hong Kong Polytech Univ, Ctr Smart Hlth, Sch Nursing, Hong Kong, Peoples R China
关键词
Acoustics; Feature extraction; Spectrogram; Transforms; Histograms; Time-frequency analysis; Visualization; acoustic scene recognition; transportation; acoustic material; HEALTH;
D O I
10.1109/ACCESS.2020.3027474
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Acoustic scene recognition (ASR), recognizing acoustic environments given an audio recording of the scene, has a wide range of applications, e.g. robotic navigation and audio forensic. However, ASR remains challenging mainly due to the difficulty of representing audio data. In this article, we focus on traffic acoustic data. Traffic acoustic sense recognition provides complementary information to visual information of the scene; for example, it can be used to verify the visual perception result. The acoustic analysis and recognition, in consideration of its simple and convenient, can effectively enhance the perception ability which only applies visual information. We propose an audio data representation method to improve the traffic acoustic scene recognition accuracy. The proposed method employs the constant Q transform (CQT) and histogram of gradient (HOG) to transfer the one-dimensional audio signals into a time-frequency representation. We also propose two data representation mechanisms, called global and local feature selections, in order to select features that are able to describe the shape of time-frequency structures. We finally exploit the least absolute shrinkage and selection operator (LASSO) technique to further improve the recognition accuracy, by further selecting the most representative information for the recognition. We implemented extensive experiments, and the results show that the proposed method is effective, significantly outperforming the state-of-the-art methods.
引用
收藏
页码:177863 / 177873
页数:11
相关论文
共 50 条
  • [1] Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification
    Hou, Yuanbo
    Song, Siyang
    Yu, Chuang
    Wang, Wenwu
    Botteldooren, Dick
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1382 - 1386
  • [2] Audio Bank: A High-Level Acoustic Signal Representation for Audio Event Recognition
    Sandhan, Tushar
    Sonowal, Sukanya
    Choi, Jin Young
    [J]. 2014 14TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2014), 2014, : 82 - 87
  • [3] Effect of Acoustic Scene Complexity and Visual Scene Representation on Auditory Perception in Virtual Audio-Visual Environments
    Fichna, Stefan
    Biberger, Thomas
    Seeber, Bernhard U.
    Ewert, Stephan D.
    [J]. 2021 IMMERSIVE AND 3D AUDIO: FROM ARCHITECTURE TO AUTOMOTIVE (I3DA), 2021,
  • [4] PATTERN-RECOGNITION BY AUDIO REPRESENTATION OF MULTIVARIATE ANALYTICAL DATA
    YEUNG, ES
    [J]. ANALYTICAL CHEMISTRY, 1980, 52 (07) : 1120 - 1123
  • [5] Supervised Representation Learning for Audio Scene Classification
    Rakotomamonjy, Alain
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1253 - 1265
  • [6] Audio scene recognition based on audio events and topic model
    Leng, Yan
    Zhou, Nai
    Sun, Chengli
    Xu, Xinyan
    Yuan, Qi
    Cheng, Chuanfu
    Liu, Yunxia
    Li, Dengwang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 125 : 1 - 12
  • [7] Acoustic Scene Classification using Audio Tagging
    Jung, Jee-weon
    Shim, Hye-jin
    Kim, Ju-ho
    Kim, Seung-bin
    Yu, Ha-Jin
    [J]. INTERSPEECH 2020, 2020, : 1176 - 1180
  • [8] Traffic Scene Classification on a Representation Budget
    Sikiric, Ivan
    Brkic, Karla
    Bevandic, Petra
    Kreso, Ivan
    Krapac, Josip
    Segvic, Sinisa
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (01) : 336 - 345
  • [9] Hapto-acoustic Scene Representation
    Ritterbusch, Sebastian
    Constantinescu, Angela
    Koch, Volker
    [J]. COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, PT II, 2012, 7383 : 644 - 650
  • [10] MOVIE AUDIO SCENE RECOGNITION BASED ON WFST
    Yang, Jichen
    Cai, Min
    Li, Yanxiong
    Jin, Hai
    [J]. PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2016, : 77 - 80