PerceptGuide: A Perception Driven Assistive Mobility Aid Based on Self-Attention and Multi-Scale Feature Fusion

被引：1

作者：

Madake, Jyoti ^{[1
]}

Bhatlawande, Shripad ^{[1
]}

Solanke, Anjali ^{[2
]}

Shilaskar, Swati ^{[1
]}

机构：

[1] Vishwakarma Inst Technol, Pune 411037, India

[2] Marathwada Mitra Mandals Coll Engn, Pune 411052, India

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Blind assistive; mobility aid; scene understanding; wearable aid; Resnet-50; feature fusion; self-attention; multilayer GRU; SYSTEM; BLIND;

D O I：

10.1109/ACCESS.2023.3314702

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The paper introduces a novel wearable aid, PerceptGuide to help for visually impaired individuals to perceive scene around them. It is designed as a wearable, light weight chest rig bag, that incorporates a monocular camera, ultrasonic sensors, vibration motors, and a mono-earphone, powered by an embedded Nvidia Jetson development board. The system provides directional obstacle alerts through the vibration motors, allowing users to avoid obstacles on their path. A user-friendly push-button enables user to inquire about scene information in front of them. The scene details are effectively conveyed through a novel scene understanding approach, that combines multi-scale feature fusion, self-attention models, and a multilayer GRU (Gated Recurrent Unit) architecture on the ResNet50 backbone. The proposed system generates coherent and descriptive captions by capturing image features at different scales, enhancing the quality and contextual understanding of the scene details. The self-attention in both the encoder (ResNet50 + Feature fusion model) and decoder (multilayer GRU), effectively captures long-range dependencies and attend to relevant image regions. The quantitative evaluations conducted on the MSCOCO and Flicker8k datasets show the effectiveness of the model with improved Bleu-67.7, RougeL - 47.6, Meteor - 22.7 and CIEDR-67.4 scores. The PerceptGuide system exhibits exceptional real-time performance, generating audible captions in just 1.5 to 2 seconds. This rapid response time significantly aids visually impaired individuals in understanding the scenes around them. The qualitative evaluation of the aid emphasizes its real-time performance, demonstrating the generation of context-aware, semantically meaningful captions. This validates its potential as a wearable assistive aid for visually impaired people, with the added advantages of low power consumption, compactness, and a lightweight design.

引用

页码：101167 / 101182

页数：16

共 50 条

[21] Shunted Self-Attention via Multi-Scale Token Aggregation
Ren, Sucheng
Zhou, Daquan
He, Shengfeng
Feng, Jiashi
Wang, Xinchao
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10843 - 10852
[22] Binocular Depth Estimation Algorithm Based on Multi-Scale Attention Feature Fusion
Yang Huitong
Lei Lang
Lin Yongchun
[J]. LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (18)
[23] Underwater Image Enhancement Based on Multi-Scale Feature Fusion and Attention Network
Liu Y.
Liu M.
Lin S.
Tao Z.
[J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (05): : 685 - 695
[24] A Robust Vehicle Detection Model Based on Attention and Multi-scale Feature Fusion
Zhu, Yuxin
Liu, Wenbo
Yan, Fei
Li, Jun
[J]. 2022 14TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING, WCSP, 2022, : 143 - 148
[25] Self-Attention Multilayer Feature Fusion Based on Long Connection
Yuezhong, Chu
Jiaqing, Wang
Heng, Liu
[J]. ADVANCES IN MULTIMEDIA, 2022, 2022
[26] MFANet: Multi-scale feature fusion network with attention mechanism
Wang, Gaihua
Gan, Xin
Cao, Qingcheng
Zhai, Qianyu
[J]. VISUAL COMPUTER, 2023, 39 (07): : 2969 - 2980
[27] MFANet: Multi-scale feature fusion network with attention mechanism
Gaihua Wang
Xin Gan
Qingcheng Cao
Qianyu Zhai
[J]. The Visual Computer, 2023, 39 : 2969 - 2980
[28] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
[J]. Speech Communication, 2022, 139 : 1 - 9
[29] Multi-modal speech emotion recognition using self-attention mechanism and multi-scale fusion framework
Liu, Yang
Sun, Haoqin
Guan, Wenbo
Xia, Yuqi
Zhao, Zhen
[J]. SPEECH COMMUNICATION, 2022, 139 : 1 - 9
[30] DS-MSFF-Net: Dual-path self-attention multi-scale feature fusion network for CT image segmentation
Xiaoqian Zhang
Lei Pu
Liming Wan
Xiao Wang
Ying Zhou
[J]. Applied Intelligence, 2024, 54 : 4490 - 4506

← 1 2 3 4 5 →