Vehicle classification based on audio-visual feature fusion with low-quality images and noise

被引:1
|
作者
Zhao, Yiming [1 ]
Zhao, Hongdong [1 ]
Zhang, Xuezhi [1 ]
Liu, Weina [1 ]
机构
[1] Hebei Univ Technol, Sch Elect Informat & Engn, Tianjin, Peoples R China
关键词
Vehicle classification; feature fusion; convolutional neural network; low-quality images;
D O I
10.3233/JIFS-232812
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Intelligent Transport Systems, vision is the primary mode of perception. However, vehicle images captured by low-cost traffic cameras under challenging weather conditions often suffer from poor resolution and insufficient detail representation. On the other hand, vehicle noise provides complementary auditory features that offer advantages such as environmental adaptability and a large recognition distance. To address these limitations and enhance the accuracy of low-quality traffic surveillance classification and identification, an effective audio-visual feature fusion method is crucial. This paper presents a research study that establishes an Urban Road Vehicle Audio-visual (URVAV) dataset specifically designed for low-quality images and noise recorded in complex weather conditions. For low-quality vehicle image classification, the paper proposes a simple Convolutional Neural Network (CNN)-based model called Low-quality Vehicle Images Net (LVINet). Additionally, to further enhance classification accuracy, a spatial channel attention-based audio-visual feature fusion method is introduced. This method converts one-dimensional acoustic features into a two-dimensional audio Mel-spectrogram, allowing for the fusion of auditory and visual features. By leveraging the high correlation between these features, the representation of vehicle characteristics is effectively enhanced. Experimental results demonstrate that LVINet achieves a classification accuracy of 93.62% with reduced parameter count compared to existing CNN models. Furthermore, the proposed audio-visual feature fusion method improves classification accuracy by 7.02% and 4.33% when compared to using single audio or visual features alone, respectively.
引用
下载
收藏
页码:8931 / 8944
页数:14
相关论文
共 50 条
  • [41] Audio-Visual Speech Recognition Using A Two-Step Feature Fusion Strategy
    Liu, Hong
    Xu, Wanlu
    Yang, Bing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1896 - 1903
  • [42] Audio-visual feature fusion via deep neural networks for automatic speech recognition
    Rahmani, Mohammad Hasan
    Almasganj, Farshad
    Seyyedsalehi, Seyyed Ali
    DIGITAL SIGNAL PROCESSING, 2018, 82 : 54 - 63
  • [43] A novel vehicle collision detection system: Integrating audio-visual fusion for enhanced performance
    Li, Kunyue
    Zhao, Zhengji
    Cai, Qixuan
    Wang, Qin
    Jing, Naifeng
    Mao, Zhigang
    Jiang, Jianfei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [44] Feature Reconstruction using Sparse Imputation for Noise Robust Audio-Visual Speech Recognition
    Shen, Peng
    Tamura, Satoshi
    Hayamizu, Satoru
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [45] Feature Fusion for Classification Enhancement of Ground Vehicle SAR Images
    Bolourchi, Pouya
    Moradi, Masoud
    Demirel, Hasan
    Uysal, Sener
    2017 19TH UKSIM-AMSS INTERNATIONAL CONFERENCE ON MATHEMATICAL MODELLING & COMPUTER SIMULATION (UKSIM), 2017, : 111 - 115
  • [46] Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature
    Komai, Yuto
    Ariki, Yasuo
    Takiguchi, Tetsuya
    ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PT I, 2011, 7087 : 97 - 108
  • [47] Biometric person authentication with liveness detection based on audio-visual fusion
    Chetty, Girija
    Wagner, Michael
    INTERNATIONAL JOURNAL OF BIOMETRICS, 2009, 1 (04) : 463 - 478
  • [48] Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition
    Guo, Peini
    Chen, Zhengyan
    Li, Yidi
    Liu, Hong
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 315 - 326
  • [49] Acoustic fault diagnosis technology based on dual audio-visual images
    Hou J.-J.
    Ma J.
    Fang Z.-P.
    Du W.-L.
    Zhendong Gongcheng Xuebao/Journal of Vibration Engineering, 2019, 32 (05): : 927 - 934
  • [50] Genetic Programming-Based Discriminative Feature Learning for Low-Quality Image Classification
    Bi, Ying
    Xue, Bing
    Zhang, Mengjie
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (08) : 8272 - 8285