PerceptGuide: A Perception Driven Assistive Mobility Aid Based on Self-Attention and Multi-Scale Feature Fusion

被引:1
|
作者
Madake, Jyoti [1 ]
Bhatlawande, Shripad [1 ]
Solanke, Anjali [2 ]
Shilaskar, Swati [1 ]
机构
[1] Vishwakarma Inst Technol, Pune 411037, India
[2] Marathwada Mitra Mandals Coll Engn, Pune 411052, India
关键词
Blind assistive; mobility aid; scene understanding; wearable aid; Resnet-50; feature fusion; self-attention; multilayer GRU; SYSTEM; BLIND;
D O I
10.1109/ACCESS.2023.3314702
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The paper introduces a novel wearable aid, PerceptGuide to help for visually impaired individuals to perceive scene around them. It is designed as a wearable, light weight chest rig bag, that incorporates a monocular camera, ultrasonic sensors, vibration motors, and a mono-earphone, powered by an embedded Nvidia Jetson development board. The system provides directional obstacle alerts through the vibration motors, allowing users to avoid obstacles on their path. A user-friendly push-button enables user to inquire about scene information in front of them. The scene details are effectively conveyed through a novel scene understanding approach, that combines multi-scale feature fusion, self-attention models, and a multilayer GRU (Gated Recurrent Unit) architecture on the ResNet50 backbone. The proposed system generates coherent and descriptive captions by capturing image features at different scales, enhancing the quality and contextual understanding of the scene details. The self-attention in both the encoder (ResNet50 + Feature fusion model) and decoder (multilayer GRU), effectively captures long-range dependencies and attend to relevant image regions. The quantitative evaluations conducted on the MSCOCO and Flicker8k datasets show the effectiveness of the model with improved Bleu-67.7, RougeL - 47.6, Meteor - 22.7 and CIEDR-67.4 scores. The PerceptGuide system exhibits exceptional real-time performance, generating audible captions in just 1.5 to 2 seconds. This rapid response time significantly aids visually impaired individuals in understanding the scenes around them. The qualitative evaluation of the aid emphasizes its real-time performance, demonstrating the generation of context-aware, semantically meaningful captions. This validates its potential as a wearable assistive aid for visually impaired people, with the added advantages of low power consumption, compactness, and a lightweight design.
引用
收藏
页码:101167 / 101182
页数:16
相关论文
共 50 条
  • [21] Shunted Self-Attention via Multi-Scale Token Aggregation
    Ren, Sucheng
    Zhou, Daquan
    He, Shengfeng
    Feng, Jiashi
    Wang, Xinchao
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10843 - 10852
  • [22] Binocular Depth Estimation Algorithm Based on Multi-Scale Attention Feature Fusion
    Yang Huitong
    Lei Lang
    Lin Yongchun
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (18)
  • [23] Underwater Image Enhancement Based on Multi-Scale Feature Fusion and Attention Network
    Liu Y.
    Liu M.
    Lin S.
    Tao Z.
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (05): : 685 - 695
  • [24] A Robust Vehicle Detection Model Based on Attention and Multi-scale Feature Fusion
    Zhu, Yuxin
    Liu, Wenbo
    Yan, Fei
    Li, Jun
    [J]. 2022 14TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING, WCSP, 2022, : 143 - 148
  • [25] Self-Attention Multilayer Feature Fusion Based on Long Connection
    Yuezhong, Chu
    Jiaqing, Wang
    Heng, Liu
    [J]. ADVANCES IN MULTIMEDIA, 2022, 2022
  • [26] MFANet: Multi-scale feature fusion network with attention mechanism
    Wang, Gaihua
    Gan, Xin
    Cao, Qingcheng
    Zhai, Qianyu
    [J]. VISUAL COMPUTER, 2023, 39 (07): : 2969 - 2980
  • [27] MFANet: Multi-scale feature fusion network with attention mechanism
    Gaihua Wang
    Xin Gan
    Qingcheng Cao
    Qianyu Zhai
    [J]. The Visual Computer, 2023, 39 : 2969 - 2980
  • [28] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    [J]. Speech Communication, 2022, 139 : 1 - 9
  • [29] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
    Liu, Yang
    Sun, Haoqin
    Guan, Wenbo
    Xia, Yuqi
    Zhao, Zhen
    [J]. SPEECH COMMUNICATION, 2022, 139 : 1 - 9
  • [30] DS-MSFF-Net: Dual-path self-attention multi-scale feature fusion network for CT image segmentation
    Xiaoqian Zhang
    Lei Pu
    Liming Wan
    Xiao Wang
    Ying Zhou
    [J]. Applied Intelligence, 2024, 54 : 4490 - 4506