Vehicle classification based on audio-visual feature fusion with low-quality images and noise

被引:1
|
作者
Zhao, Yiming [1 ]
Zhao, Hongdong [1 ]
Zhang, Xuezhi [1 ]
Liu, Weina [1 ]
机构
[1] Hebei Univ Technol, Sch Elect Informat & Engn, Tianjin, Peoples R China
关键词
Vehicle classification; feature fusion; convolutional neural network; low-quality images;
D O I
10.3233/JIFS-232812
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Intelligent Transport Systems, vision is the primary mode of perception. However, vehicle images captured by low-cost traffic cameras under challenging weather conditions often suffer from poor resolution and insufficient detail representation. On the other hand, vehicle noise provides complementary auditory features that offer advantages such as environmental adaptability and a large recognition distance. To address these limitations and enhance the accuracy of low-quality traffic surveillance classification and identification, an effective audio-visual feature fusion method is crucial. This paper presents a research study that establishes an Urban Road Vehicle Audio-visual (URVAV) dataset specifically designed for low-quality images and noise recorded in complex weather conditions. For low-quality vehicle image classification, the paper proposes a simple Convolutional Neural Network (CNN)-based model called Low-quality Vehicle Images Net (LVINet). Additionally, to further enhance classification accuracy, a spatial channel attention-based audio-visual feature fusion method is introduced. This method converts one-dimensional acoustic features into a two-dimensional audio Mel-spectrogram, allowing for the fusion of auditory and visual features. By leveraging the high correlation between these features, the representation of vehicle characteristics is effectively enhanced. Experimental results demonstrate that LVINet achieves a classification accuracy of 93.62% with reduced parameter count compared to existing CNN models. Furthermore, the proposed audio-visual feature fusion method improves classification accuracy by 7.02% and 4.33% when compared to using single audio or visual features alone, respectively.
引用
下载
收藏
页码:8931 / 8944
页数:14
相关论文
共 50 条
  • [31] Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
    Gebru, Israel D.
    Ba, Sileye
    Li, Xiaofei
    Horaud, Radu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) : 1086 - 1099
  • [32] Paper: Speaker Localization Based on Audio-Visual Bimodal Fusion
    Zhu, Ying-Xin
    Jin, Hao-Ran
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2021, 25 (03) : 375 - 382
  • [33] Attention-Based Audio-Visual Fusion for Video Summarization
    Fang, Yinghong
    Zhang, Junpeng
    Lu, Cewu
    NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 328 - 340
  • [34] Research on Audio-visual Emotion Fusion based on Superposition Response
    Zhang, Hua
    Jiang, Wei
    PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 1558 - 1562
  • [35] Robust Audio-Visual Speech Recognition Based on Hybrid Fusion
    Liu, Hong
    Li, Wenhao
    Yang, Bing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7580 - 7586
  • [36] AUDIO-VISUAL FEATURE INTEGRATION BASED ON PIECEWISE LINEAR TRANSFORMATION FOR NOISE ROBUST AUTOMATIC SPEECH RECOGNITION
    Kashiwagi, Yosuke
    Suzuki, Masayuki
    Minematsu, Nobuaki
    Hirose, Keikichi
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 149 - 152
  • [37] Investigations into the robustness of audio-visual gender classification to background noise and illumination effects
    Stewart, Darryl
    Wang, Hongbin
    Shen, Jiali
    Miller, Paul
    2009 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA 2009), 2009, : 168 - 174
  • [38] Audio-Visual Speech Classification based on Absent Class Detection
    Daniel Sad, Gonzalo
    Carlos Gomez, Juan
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 336 - 340
  • [39] The study on low-quality images Geometric Facial Feature Extraction
    Liu, Xueping
    Li, Yibo
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON LOGISTICS, ENGINEERING, MANAGEMENT AND COMPUTER SCIENCE, 2014, 101 : 290 - 293
  • [40] Time–Frequency Feature Fusion for Noise Robust Audio Event Classification
    Ian McLoughlin
    Zhipeng Xie
    Yan Song
    Huy Phan
    Ramaswamy Palaniappan
    Circuits, Systems, and Signal Processing, 2020, 39 : 1672 - 1687