Vehicle classification based on audio-visual feature fusion with low-quality images and noise

被引:1
|
作者
Zhao, Yiming [1 ]
Zhao, Hongdong [1 ]
Zhang, Xuezhi [1 ]
Liu, Weina [1 ]
机构
[1] Hebei Univ Technol, Sch Elect Informat & Engn, Tianjin, Peoples R China
关键词
Vehicle classification; feature fusion; convolutional neural network; low-quality images;
D O I
10.3233/JIFS-232812
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Intelligent Transport Systems, vision is the primary mode of perception. However, vehicle images captured by low-cost traffic cameras under challenging weather conditions often suffer from poor resolution and insufficient detail representation. On the other hand, vehicle noise provides complementary auditory features that offer advantages such as environmental adaptability and a large recognition distance. To address these limitations and enhance the accuracy of low-quality traffic surveillance classification and identification, an effective audio-visual feature fusion method is crucial. This paper presents a research study that establishes an Urban Road Vehicle Audio-visual (URVAV) dataset specifically designed for low-quality images and noise recorded in complex weather conditions. For low-quality vehicle image classification, the paper proposes a simple Convolutional Neural Network (CNN)-based model called Low-quality Vehicle Images Net (LVINet). Additionally, to further enhance classification accuracy, a spatial channel attention-based audio-visual feature fusion method is introduced. This method converts one-dimensional acoustic features into a two-dimensional audio Mel-spectrogram, allowing for the fusion of auditory and visual features. By leveraging the high correlation between these features, the representation of vehicle characteristics is effectively enhanced. Experimental results demonstrate that LVINet achieves a classification accuracy of 93.62% with reduced parameter count compared to existing CNN models. Furthermore, the proposed audio-visual feature fusion method improves classification accuracy by 7.02% and 4.33% when compared to using single audio or visual features alone, respectively.
引用
下载
收藏
页码:8931 / 8944
页数:14
相关论文
共 50 条
  • [1] An audio-visual sensor fusion approach for feature based vehicle identification
    Klausner, Andreas
    Tengg, Allan
    Leistner, Christian
    Erb, Stefan
    Rinner, Bernhard
    2007 IEEE CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, 2007, : 111 - 116
  • [2] Audio-Visual Feature Fusion for Vehicles Classification in a Surveillance System
    Wang, Tao
    Zhu, Zhigang
    Hammoud, Riad
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2013, : 381 - 386
  • [3] Audio-Visual Feature Fusion for Speaker Identification
    Almaadeed, Noor
    Aggoun, Amar
    Amira, Abbes
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
  • [4] Speaker Tracking Based on Audio-Visual Fusion with Unknown Noise
    Cao, Jie
    Li, Jun
    Li, Wei
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 215 - 226
  • [5] Assessment and Classification of Singing Quality Based on Audio-Visual Features
    Bokshi, Marigona
    Tao, Fei
    Busso, Carlos
    Hansen, John H. L.
    2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,
  • [6] Vehicle Detection and Classification using Audio-Visual cues
    Piyush, P.
    Rajan, Rajeev
    Mary, Leena
    Koshy, Bino I.
    2016 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2016, : 732 - 736
  • [7] An Audio-Visual based Feature Level Fusion Approach Applied to Deception Detection
    Chebbi, Safa
    Ben Jebara, Sofia
    VISAPP: PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 4: VISAPP, 2020, : 197 - 205
  • [8] Fusion of Audio Visual Cues for Vehicle Classification
    Daniel, Christin
    Mary, Leena
    2016 INTERNATIONAL CONFERENCE ON NEXT GENERATION INTELLIGENT SYSTEMS (ICNGIS), 2016, : 195 - 198
  • [9] AVMSN: An Audio-Visual Two Stream Crowd Counting Framework Under Low-Quality Conditions
    Hu, Ruihan
    Mo, Qinglong
    Xie, Yuanfei
    Xu, Yongqian
    Chen, Jiaqi
    Yang, Yalun
    Zhou, Hongjian
    Tang, Zhi-Ri
    Wu, Edmond Q.
    IEEE ACCESS, 2021, 9 : 80500 - 80510
  • [10] AVMSN: An Audio-Visual Two Stream Crowd Counting Framework under Low-Quality Conditions
    Hu, Ruihan
    Mo, Qinglong
    Xie, Yuanfei
    Xu, Yongqian
    Chen, Jiaqi
    Yang, Yalun
    Zhou, Hongjian
    Tang, Zhi-Ri
    Wu, Edmond Q.
    IEEE Access, 2021, 9 : 80500 - 80510